Amendment and Response Page 4 of 1 4 

Serial No.: 09/898,238 
Confirmation No.: 7517 
Filed: July 3, 2001 

For: ISOLATED AND PURIFIED DNA MOLECULE AND PROTEIN FOR THE DEGRADATION OF 
TRIAZENE COMPOUNDS 

Remarks 

The Office Action mailed August 26, 2002 has been received and reviewed. 
Claims 7, 9, 10, 24, 25, and 27 having been amended, and claims 5 and 6 having been canceled, 
the pending claims are claims 7-10, 17-18, and 24-27. Claims 17-18 have been withdrawn from 
consideration. Reconsideration and withdrawal of the rejections are respectfully requested 

The amendment of claims 9 and 10 are supported by originally filed claims 1 and 

3, respectively. 

Claims 24 and 25 have been amended to correct a typographical error. 
The amendments of claims 25 and 27 are supported by the specification at, for 
instance, page 10, line 4 through page 1 1, line 4. 

Information Disclosure Statement 

Applicants note that a document cited on the Information Disclosure Statement 
received by the Office on April 3, 2002, was not initialed or crossed out. This document, foreign 
patent document WO 95/01437, is listed on a 1449 which is included herewith, and a copy of the 
document is also included for the Examiner's convenience. Consideration of the document listed 
on the attached 1449 form(s) is respectfully requested. Pursuant to the provisions of MPEP §609, 
Applicants further request that a copy of the 1449 form, marked as being considered and 
initialed by the Examiner, be returned with the next Official Communication. 

It is believed that no fee is due for the consideration of this document, as it was 
originally submitted to the Office prior to the receipt of any Action on the merits. However, in 
the event a fee is due, please charge any fee or credit any overpayment to Account No. 13-4895. 

Specification 

The specification has been amended at page 1, line 4, to reflect the issuance of 
Serial No. 08/546,793. The Examiner is requested to withdraw the objection to the disclosure. 
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Claim Objections 

Claim 7 has been amended to delete (SEQ. ID NO:2) » and insert "(SEQ ID 
NO:2)" therefore. The Examiner is requested to withdraw the objection to claim 7. 

The 35 U.S.C. 8112. Second Paragraph, Rejection 

The Examiner rejected claims 5, 9,10, 25 and 27 under 35 U.S.C. §112, second 
paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject 
matter which Applicants regar d as the invention. 

Claim 5 has been canceled. Claims 9 and 10 have been amended to no longer 
depend upon canceled claims. 

This rejection, as it relates to claims 25 and 27, is traversed. It is respectfully 
submitted that claims 25 and 27 are not indefinite. In the interests of furthering prosecution, 
claims 25 and 27 have been amended as follows. Claims 25 and 27 have been amended to recite 
an "isolated and purified protein and biologically active derivatives thereof, wherein the protein 
and the biologically active derivatives thereof convert atrazine to hydroxyatrazine . . . ." 

In response to this rejection, claim 25 has also been amended to recite "wherein 
the protein and the biologically active derivatives thereof comprise an amino acid sequence 
encoded by a DNA molecule . . . and claim 27 has been amended to recite "wherein the protein 
and the biologically active derivatives thereof comprise an amino acid sequence having greater 
than about 80% sequence identity . . . ." 

Reconsideration and withdrawal of the rejection of claims 9-10, 25, and 27 under 
35 U.S.C. §112, second paragraph, as being indefinite is respectfully requested. 

The 35 U.S.C. §112, First Paragraph, Written Description Rejection 

The Examiner rejected claims 5, 6, 25 and 27 under 35 U.S.C. §112, first 
paragraph, as containing subject matter which was not described in the specification in such a 
way as to reasonably convey to one skilled in the relevant art that the inventor(s), at the time the 
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application was filed, had possession of the claimed invention. Claims 5 and 6 have been 
canceled. This rejection is respectfully traversed. 

To begin with, the Examiner is requested to note that the Action asserts "while 
applicants recite an activity for the biologically active derivatives thereof, the claimed 
derivatives have no defined structure" (Action, page 6, first full paragraph). As discussed above 
in the response to the rejection based on §1 12, second paragraph, claims 25 and 27 have been 
amended to clearly show that both protein and biologically active derivatives thereof "comprise 
an amino acid sequence encoded by a DNA molecule having a complement that hybridizes to a 
DNA having the sequence shown in Figure 6 (SEQ ID NO:l) . . . (claim 25) or "comprise an 
amino acid sequence having greater than about 80% sequence identity . . . ."(claim 27). 

The Examiner is further requested to note that the Action asserts "while claims 25 
and 27 are further drawn to a genus of proteins which are structurally defined as they relate to 
SEQ ID NO:l or 2, these genuses of proteins (of claims 25 and 27) are not defined functionally" 
(Action, page 6, first full paragraph). As discussed above in the response to the rejection based 
on §1 12, second paragraph, claims 25 and 27 have been amended to recite "[a]n isolated and 
purified protein and biologically active derivatives thereof, wherein the protein and the 
biologically active derivatives thereof convert atrazine to hydroxyatrazine . . . 

To meet the written description requirement of 35 U.S.C. §112, first paragraph, 
the application "must convey with reasonable clarity to those skilled in the art that, as of the 
filing date sought, he or she was in possession of the invention, i.e., what is now claimed." 
M.P.E.P. §2163. Factors to be considered in determining whether there is sufficient evidence of 
possession include 

"sufficient description of a representative number of species by actual reduction 
to practice . . reduction to drawings . . ., or by disclosure of relevant identifying 
characteristics, i.e., structure or other physical and/or chemical properties, by 
functional characteristics coupled with a known or disclosed correlation between 
function and structure, or by a combination of such identifying characteristics, 
sufficient to show the applicant was in possession of the claimed genus." 
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M.P.E.P. §2163(II)(A)(3)(a)(ii). 

Claim 25 is directed to 

"an isolated and purified protein and biologically active derivatives thereof, wherein the 
protein and the biologically active derivatives thereof convert atrazine to 
hydroxyatrazine, wherein the protein and the biologically active derivatives thereof 
comprise an amino acid sequence encoded by a DNA molecule having a complement that 
hybridizes to a DNA having the sequence shown in Figure 6 (SEQ ID NO:l), beginning 
at position 236 and ending at position 1655, under the stringency conditions of 
hybridization in buffer containing 0.25 M Na 2 HP0 4 , 7% SDS, 1% BSA, 1.0 mM EDTA 
at 65°C, followed by washing with 0.1% SDS and 0.1 x SSC at 65°C." 

Thus, the protein and biologically active derivatives of claim 25 are claimed by both physical 

(wherein the protein and the biologically active derivatives thereof comprise an amino acid 

sequence encoded by a DNA molecule having a complement that hybridizes to a DNA having 

the sequence shown in Figure 6 (SEQ ID NO:l), beginning at position 236 and ending at 

position 1655, under the stringency conditions of hybridization in buffer containing 0.25 M 

Na 2 HP0 4 , 7% SDS, 1% BSA, 1 .0 mM EDTA at 65°C, followed by washing with 0.1% SDS and 

O.lx SSC at 65°C) and functional characteristics (wherein the protein and the biologically active 

derivatives thereof convert atrazine to hydroxyatrazine). 

Claim 27 is directed to 

"an isolated and purified protein and biologically active derivatives thereof, wherein the 
protein and the biologically active derivatives thereof convert atrazine to 
hydroxyatrazine, wherein the protein and the biologically active derivatives thereof 
comprise an amino acid sequence having greater than about 80% sequence identity to the 
amino acid sequence depicted at SEQ ID NO:2." 

Thus, as with the protein and biologically active derivatives of claim 25, the protein and 
biologically active derivatives of claim 27 are defined by both physical (wherein the protein and 
the biologically active derivatives thereof comprise an amino acid sequence having greater than 
about 80% sequence identity to the amino acid sequence depicted at SEQ ID NO:2) and 
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functional characteristics (wherein the protein and the biologically active derivatives thereof 
convert atrazine to hydroxyatrazine). 

Applicants respectfully submit that the description of the genus of proteins and 
biologically active derivatives thereof by physical and/or chemical properties and by functional 
characteristics includes sufficient description of a representative number of species. Applicants 
further submit that the specification includes a description of the actual reduction to practice of 
an isolated and purified protein that converts atrazine to hydroxyatrazine, and a DNA molecule 
encoding the protein. Applicants' Representatives respectfully point out that the M.P.E.P. recites 
actual reduction to practice of species as only one factor that may satisfy the written description 
requirement of a claimed genus. Furthermore, the M.P.E.P. recognizes "situations where one 
species may adequately support a genus" (M.P.E.P. §2163(II)(A)(3)(a)(ii)). Moreover, the 
"Revised Interim Written Description Guidelines Training Materials" issued by the United States 
Patent and Trademark Office (http://www.uspto.gov/web/menu/written.pdf), recognizes claims 
where a "single embodiment is representative of the genus," and that "one of skill in the art 
would recognize that applicant was in possession of all of the various . . . methods necessary to 
practice the invention." Applicants respectfully submit that a representative number of 
adequately described species have been disclosed to satisfy the written description requirement 
of 37 C.F.R. §112, first paragraph for the genus of proteins and biologically active derivatives 
thereof recited in claims 25 and 27. 

Thus, Applicants respectively submit that the specification includes a written 
description sufficient to reasonably convey to one of skill in the art their possession of the 
claimed invention. Applicants respectfully request that the Examiner reconsider and withdraw 
the rejections under 35 U.S.C. § 1 12, first paragraph. 
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The 35 U.S.C §112, First Paragraph. Enablement Rejection 

The Examiner rejected claims 5, 6, 25, and 27 under 35 U.S.C. §112, first 
paragraph, as containing subject matter which was not described in the specification in such a 
way as to enable one skilled in the art to which it pertains, or with which it is most nearly 
connected, to make and/or use the invention. This rejection is respectfully traversed. 

The Action asserts that the specification "does not reasonably provide enablement 
for . . . any protein that converts atrazine to hydroxyatrazine, wherein the protein has a molecular 
weight of about 245 kilodaitons and is a hemotetrarner" (Action, page 7, second full paragraph) 
Claims 5 and 6 have been canceled, thereby rendering moot the aspect of the rejection based on 
"any protein that converts atrazine to hydroxyatrazine, wherein the protein has a molecular 
weight of about 245 kilodaltons and is a homotetramer." 

The Action asserts that the specification "does not reasonably provide enablement 
for any biologically active derivative that converts atrazine to hydroxyatrazine . . . ." (Action, 
page 7, second full paragraph), and "[t]he biologically active derivatives of claims 25 and 27 
rejected under this section of U.S.C. 112, first paragraph, do not place any structural limits on 
the claimed enzymes, while the remaining portion of the genus of these claims is defined 
structurally, but is not defined functionally" (Action, page 8, second full paragraph). 

The Examiner is requested to note that, as discussed above in the response to the 
rejection based on §1 12, second paragraph, claims 25 and 27 have been amended to clearly show 
that biologically active derivatives thereof "comprise an amino acid sequence encoded by a DNA 
molecule having a complement that hybridizes to a DNA having the sequence shown in Figure 6 
(SEQ ID NO:l) . . . (claim 25) or "comprise an amino acid sequence having greater than about 
80% sequence identity . . . ."(claim 27). Thus, claims 25 and 27 are not directed to "any 
biologically active derivative that converts atrazine to hydroxyatrazine" (Action, page 7, second 
full paragraph), but are directed to biologically active derivatives that convert atrazine to 
hydroxyatrazine and have the structure recited in claim 25 or claim 27. 
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The Examiner is further requested to note that, as discussed above in the response 
to the rejection based on §1 12, second paragraph, claims 25 and 27 have been amended to recite 
"[a]n isolated and purified protein and biologically active derivatives thereof, wherein the 
protein and the biologically active derivatives thereof convert atrazine to hydroxyatrazine . . . 

In view of the amendments to claims 25 and 27 to define the genus structurally 
and functionally, it is respectfully submitted that the rejection of claims 25 and 27 as not enabled 
by the specification is rendered moot. The first paragraph of §1 12 requires no more than a 
disclosure sufficient to enable the skilled worker to carry out the invention commensurate with 
the scope of the claims. The present application provides detailed guidance for identifying 
proteins and biologically active derivatives thereof that fall within the scope of the claims. For 
instance, the specification provides assays that one of skill in the art can use to determine if a 
biologically active derivative converts atrazine to hydroxyatrazine (specification at, for instance, 
page 23, line 9 through page 24, line 15). It is respectfully submitted that upon reading 
Applicant's detailed specification the skilled worker would be able to carry out the invention 
commensurate with the scope of the claims, and identify biologically active derivatives having 
the recited structure and function. 

The Examiner is respectfully requested to reconsider and withdraw the rejection 
of claims 25 and 27 under 37 C.F.R. §112, first paragraph. 

The 35 U.S.C. S102 Rejection 

The Examiner rejected claims 5-7, 9, 10, and 24-27 under 35 U.S.C. § 102(a) as 
being anticipated by Mandelbaum et al., (Applied and Environmental Microbiology, Vol. 61, 
No. 4, pgs. 1451-1457, Apr. 1995) as evidenced by DeSouza et al. (Journal of Bacteriology, Vol. 
178, No. 16, pgs. 4894-4900, Aug. 1996). This rejection is respectfully traversed. 

This rejection is based on the premise that "[t]he preparation of cell extracts of 
Pseudomonas sp. as taught by Mandelbaum et al. constitutes the 'isolation and purification 1 of 
those enzymes and proteins in the cell extract." Mandelbaum et al. teach that crude cell extracts 
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were prepared by passing washed cells three times through a French press, centrifuging the 
lysate, and passing the resulting supernatant through a cellulose acetate filter (Mandelbaum et al. 
at page 1452, col. 1, last full paragraph). This procedure results in a lysate that includes most, if 
not all, of the soluble proteins of the cell. It is respectfully submitted that this disclosure by 
Mandelbaum et al. does not constitute "isolated and purified." Further, the Action admits this in 
the characterization of Mandelbaum et al. at page 13 of the Action where it states that M [o]ne of 
ordinary skill in the art would have been motivated to further isolate and purify the atrazine 
chlorohydrolase that Mandelbaum et al was in possession of . . ." (emphasis added). As 
Mandelbaum et al. do not teach an isolated and purified protein that converts atrazine to 
hydroxyatrazine, the cited document does not teach each element of the claims. Thus, 
Mandelbaum et al. does not anticipate the pending claims. 

Reconsideration and withdrawal of the rejection of claims 5-7, 9, 10, and 24-27 
under 35 U.S.C. § 102(a) as being anticipated by Mandelbaum et al. in view of DeSouza et al. is 
respectfully requested. 

The 35 U.S.C. §103 Rejection 

The Examiner rejected claim 8 under 35 U.S.C. §103 as being unpatentable over 
Mandelbaum et al., (Applied and Environmental Microbiology, Vol. 61, No. 4, pgs. 1451-1457, 
Apr. 1995) and Kennedy (Kennedy et al., "Principles of immobilization of enzymes," Handbook 
of Enzyme Biotechnology, 3 rd Edition, Wiseman, ed., Ellis Horwood Limited, Hertfordshire, 
Great Britain, pp. 235-310 (1995)). 

The burden is on the Office to establish a prima facie case of nonobviousness of 
the claimed invention, and it is respectfully argued that the Office has fallen short of meeting this 
burden. The three criteria that must be met by the Office to establish a prima facie case of 
nonobviousness are: 

(i) there must be a suggestion or motivation, either in the 
references themselves or in the knowledge generally available to one of 
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ordinary skill in the art, to modify the reference or to combine reference 
teachings; 

(ii) there must be a reasonable expectation of success; and 

(iii) the prior art reference (or references when combined) must 
teach or suggest all the claim limitations. 

Applicants respectfully submit that the requisite motivation to combine the two 
cited documents cannot be found in either document. It is axiomatic that motivation to combine 
two documents cannot be attributed to the combination itself, and the Examiner has not shown 
the existence in either cited reference of a motivation to combine the disclosures to produce the 
claimed invention. 

Mandelbaum et al. disclose the isolation and characterization of a Pseudomonas 
sp. that mineralizes the s-triazine herbicide atrazine. Mandelbaum et al. teach that crude cell 
extracts were prepared by passing washed cells three times through a French press, centrifuging 
the lysate, and passing the resulting supernatant through a cellulose acetate filter (Mandelbaum 
et al. at page 1452, col. 1, last full paragraph). The authors state that the "studies with liquid 
cultures indicate the potential for Pseudomonas sp. strain ADP to completely metabolize the s- 
triazine ring of atrazine to C0 2 , but this remains to be established for cultures in soil" page 1456, 
col. 1, last paragraph). Mandelbaum et al. do not teach or suggest an isolated and purified 
protein having the amino acid sequence shown in Figure 7 (SEQ D NO:2), or binding an isolated 
and purified protein to an immobilization support. 

Kennedy pertains to principles of immobilization of enzymes. Kennedy discloses 
that "most major applications proposed to date have involved the use of hydrolases .... The 
main areas of application are in the food . . . and pharmaceutical industries . . . but other potential 
areas include waste treatment" (Kennedy, page 292, second full paragraph, emphasis added). In 
the section entitled "Future trends," Kennedy discloses that "[treatment of waste water . . . [is] 
expected to be subjected to the application of immobilization technology" (Kennedy, page 293, 
first full paragraph). Further, Kennedy is not directed to the immobilization of enzymes present 
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in crude cell extracts, and states that "a number of economic problems [are] inherent to 
immobilized enzymes - namely the isolation, extraction, and purification of the enzyme in 
sufficient quantities . . . ." (page 293, col. 2, second full paragraph). Kennedy does not teach or 
suggest an isolated and purified protein having the amino acid sequence shown in Figure 7 (SEQ 
DNO:2). 

Applicants submit that a fair reading of Mandelbaum et al. would certainly not 
lead one of skill in the art to combine the crude cell extract disclosed therein with the use 
disclosed by Kennedy. Mandelbaum et al. teaches the isolation and characterization of a 
microbe, and making crude cell extracts. There is in Mandelbaum et al. no teaching of how to 
isolate and purify the claimed protein, or the sequence of the claimed protein, thus a skilled 
person would not be motivated to combine Mandelbaum et al. with the use disclosed by 
Kennedy. 

Likewise, a skilled person reading Kennedy would not be motivated to modify 
Kennedy to use the crude enzyme extract disclosed by Mandelbaum et al. Kennedy does not 
teach or suggest the immobilization of enzymes that are not isolated and purified. 

Applicants further submit there is no reasonable expectation of success even if the 
two documents are combined. The Action states that "the reasonable expectation of success 
comes from Kennedy who teach the industrial use of other immobilized hydrolases in waste 
treatment processes" (Action, page 13, second full paragraph). This statement is not an accurate 
characterization of the cited document. Kennedy discloses that waste treatment is a potential 
application for immobilized enzymes (Kennedy, page 292, second full paragraph) and that the 
treatment of waste water is expected to be subjected to the application of immobilization 
technology (Kennedy, page 293, first full paragraph). Thus, this suggestion by Kennedy is 
nothing more than an invitation to conduct further unspecified experiments. 

Finally, even if the documents are combined, they do not teach or suggest all the 
elements of the claims. Neither Mandelbaum et al. nor Kennedy or teach or suggest an isolated 
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and purified protein having the amino acid sequence shown in Figure 7 (SEQ D NO:2), or 
binding a protein to an immobilization support. 

The Examiner is requested to reconsider and withdraw the rejection of claim 8 
under 35 U.S.C. §103 as being unpatentable over the cited documents. 

Summary 

It is respectfully submitted that the pending claims 7-10, 17-18, and 24-27 are in 

condition for allowance and notification to that effect is respectfully requested. The Examiner is 

invited to contact Applicants' Representatives, at the below-listed telephone number, if it is 

believed that prosecution of this application may be assisted thereby. 

Respectfully submitted for 
WACKETT et al. 

By 

Mueting, Raasch & Gebhardt, P.A. 
P.O. Box 581415 
Minneapolis, MN 55458-1415 
Phone: (612)305-1220 
Facsimile: (612) 305-1228 
Customer Number 26813 
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26813 




Date Attorney Name: David L. Provence 

Reg. No.: 43,022 
Direct Dial: (612)305-1005 

CERTIFICATE UNDER 37 CFR §1.10 : " 
"Express Mail" mailing label number: EV 183608491 US Date of Deposit: November 2002 
The undersigned hereby certifies that this paper is being deposited with the United States Postal Service 
"Express Mail Post Office to Addressee" service under 37 CFR § 1 . 1 0 on the date indicated above and is 
addressed to the AssistantyGomnpissioneryffer Patents, Washington, D.C. 20231. 



APPENDIX A - SPECIFICATION/CLAIM AMENDMENTS 
INCLUDING NOTATIONS TO INDICATE CHANGES MADE 

Serial No.: 09/898,238 

Docket No.: 110.00230102 

Amendments to the following are indicated by underlining what has been added 
and bracketing what has been deleted. Additionally, all amendments have been shaded. 



In the Specification 

At page 1, beginning at line 4, insert the following new paragraph: 



This application is a division of U.S. Patent Application Serial No. 08/546,793, 
filed on October 23, 1995, ^(pending)]] '(U.S. Patent No. 6.284.522)] which is incorporated herein 
by reference in its entirety. 

In the Claims 

For convenience, all pending claims are shown below. 

7. (Amended) An [}The;1 isolated and purified protein [of claim 5 ;1 which has the 
amino acid sequence shown in Figure 7 (SEQ[jj] ID NO:2). 

8. The isolated and purified protein of claim 7 bound to an immobilization support. 



(Amended) An isolated and purified atrazine chlorohydrolase protein encoded 



by [ the ] a DNA molecule [of claim,, ill that hybridizes to DNA complementary to DNA having 



the sequence shown in Figure 6 (SEP ID NO: IX beginning at position 236 and ending at 
position 1655. under the stringency conditions of hybridization in buffer containing 0.25 M 
^mdf7% SDS. 1% BSA. LP mM EDTA at 65^ followed bv washing with 0.1% SDS^and 
aixSSCat65°d 



10. (Amended) An isolated and purified protein encoded by [the?! S DNA molecule 
fpf claim 3; 1 having the nucleotide sequence shown in Figure 6 (SEP ID NO: 1 ) heginningat 

( ■ : ' ' " 1 

position 236 and ending at position 1655 . 
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1 7. A method for the purification of atrazine chlorohydrolase in at least about 90% 
yield consisting of a step of adding ammonium sulfate to an aqueous cell-free extract of an 
atrazine chlorohydrolase-containing bacterium. 

18. The method of claim 17 wherein ammonium sulfate is added in an amount of no 
greater than about 20% of saturation. 

24. (Amended) An isolated and purified protein that converts atrazine to 
hydroxyatrazine, wherein the protein comprises an amino acid sequence encoded by a DNA 
molecule having a fram plMe^ l complement that hybridizes to a DNA having the sequence 
shown in Figure 6 (SEQ ID NO:l), beginning at position 236 and ending at position 1655, under 
the stringency conditions of hybridization in buffer containing 0.25 M Na 2 HP0 4 , 7% SDS, 1% 
BSA, 1.0 mM EDTA at 65°C, followed by washing with 0.1% SDS and O.lx SSC at 65°C. 

25. (Amended) An isolated and purified protein and biologically active derivatives 
thereo f, wherein the protein and the biologically active derivatives thereof [£at] convert atrazine 
to hydroxyatrazine, wherein the protein and the biologically active derivatives thereof 
comprise[js] an amino acid sequence encoded by a DNA molecule having a [b^iplimentl 
complement that hybridizes to a DNA having the sequence shown in Figure 6 (SEQ ID NO:l), 
beginning at position 236 and ending at position 1655, under the stringency conditions of 
hybridization in buffer containing 0.25 M Na 2 HP0 4 , 7% SDS, 1% BSA, 1.0 mM EDTA at 65°C, 
followed by washing with 0.1% SDS and O.lx SSC at 65°C. 

26. An isolated and purified protein that converts atrazine to hydroxyatrazine, 
wherein the protein comprises an amino acid sequence having greater than about 80% sequence 
identity to the amino acid sequence depicted at SEQ ID NO:2. 
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27. (Amended) An isolated and purified protein and biologically active derivatives 
thereof, wherein the protein and the biologically active derivatives thereof [|hat] convert atrazine 
to hydroxyatrazine, wherein the protein and the biologically active derivatives thereof 

comprise[S] an amino acid sequence having greater than about 80% sequence identity to the 
amino acid sequence depicted at SEQ ID NO:2. 
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(57) Abstract 

The present invention provides an isolated DNA molecule of the autosomal dominant spinocerebellar ataxia type 1 gene, which is 
located wi thin the short arm of chromosome 6. This isolated DNA molecule is preferably located within a 3.36 kb EcoRI fragment, Le., an 
£c<?RI fragment containing about 3360 base pairs, of the SCA1 gene. The isolated sequences contain a CAG repeat region. The number of 
CAG trinucleotide repeats (n) is S 36, preferablyn = 19-36, for normal individuals. For an affected individual n > 36, preferably n S 43. 
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sTroTTKNC * ; SPINOCEREBELLAR ATAXIA TYPE X 
5 amh MFTHOn FOR DIAG NOSIS 

Statement o f r r p e rnment Rights 
The present invention was made with government support under 
Grant Nos. NS 22920 and 27699, awarded by the National Institutes of Health. The 
l o Government has certain rights in this invention. 

Rackyroun H nf the Invention 

The spinocerebellar ataxias are a heterogeneous group of 
degenerative neurological disorders with variable clinical features resulting from 

15 degeneration of the cerebellum, brain stem, and spinocerebellar tracts. The clinical 
symptoms include ataxia, dysarthria, ophftalmoparesis, and variable degrees of 
motor weakness. The symptoms usually begin during the third or fourth decade of 
life, however, juvenile onset has been identified. Typically, the disease worsens 
gradually, often resulting in complete disability and death 10-20 years after the 

20 onset of symptoms. Individuals with juvenile onset spinocerebellar ataxias, 
however, typically have more rapid progression of the phenotype than the late onset 
cases. A method for diagnosing spinocerebellar ataxias would provide a significant 
step toward its treatment. 

Spinocerebellar ataxia type 1 (SCA1) is an autosomal dominant 

25 disorder which is genetically linked to the short arm of chromosome 6 based on 
linkage to the human major histocompatibility complex (HLA). See, for example, 
H. Yakura et al., N F.npl, J. Med. . 221, 154-155 (1974); and J.F. Jackson et al., £L 
F.npl. J. Med. . 296. 1 138-1 141 (1977). SCA1 has been shown to be tightly linked to 
the marker D6S89 on the short arm of chromosome 6, telomeric to HLA. See, for 

30 example, L.P.W. Ranum et al., Am. J. Hum. Genet.. 42, 31-41 (1991); and H.Y. 
Zoghbi et al., Am J. Hum. Genet. . 42, 23-30 (1991). Recently, two families with 
dominantly inherited ataxia failed to show detectable linkage with HLA markers but 
were found to have SCA1 when studied for linkage to D6S89, demonstrating the 
superiority of the latter marker for study of ataxia families. See, for example, B.J.B. 

35 Keats et al., Am J. Hum. Genet. . 42, 972-977 (1991). The identification and 
cloning of the SCA1 gene could provide methods of detection that would be 
extremely valuable for both family counseling and planning medical treatment. 
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Summary of the I nvention 

The present invention. is directed to a portion of an isolated 1.2-Mb 
region of DNA from the short arm of chromosome 6 containing a highly 
5 polymorphic CAG repeat region in the SCA1 gene. This CAG repeat region is 
unstable (i.e., highly variable within a population) and is expanded in individuals 
with the autosomal dominant neurodegenerative disorder spinocerebellar ataxia type 
1 (i.e., affected individuals generally have more than 36 CAG repeats). Southern 
and PCR analyses of the CAG repeat region demonstrate correlation between the 
10 size of the expanded repeat region and the age-of-onset of the disorder (with larger 
alleles, i.e., more repeat units, occurring in juvenile cases), and severity of the 
disorder (with larger alleles, i.e., more repeat units, occurring in the more severe 
cases). 

Specifically, the present invention provides a nucleic acid molecule 
15 containing a CAG repeat region of an isolated autosomal dominant spinocerebellar 
ataxia type 1 gene (herein referred to as "SCA1"), which is located within the short 
arm of chromosome 6. The SCA1 gene contains a region that encodes a protein 
herein referred to as "ataxin-1 ." The nucleic acid molecule of the present invention 
can be a single or a double-stranded polynucleotide. It can be genomic DNA, 
20 cDNA, or mRNA of any size as long as it includes the CAG repeat region of an 
isolated SCA1 gene. Preferably, the nucleic acid molecule includes the SCA1 
coding region and is of about 2.4-1 1 kb in length. It can be the entire SCA1 gene 
(whether genomic DNA or a transcript thereof) or any fragment thereof that contains 
the CAG region of the gene. One such fragment is an EcoKi fragment of the SCA1 
25 gene, i.e., a fragment obtained through digestion with EcoRl endonuclease 
restriction enzyme, containing about 3360 base pairs having therein a polymorphic 
CAG repeat region. By polymorphic CAG repeat region it is meant that there are 
repeating CAG trinucleotides in this portion of the gene that can vary in the number 
of CAG trinucleotides. The number of trinucleotide repeats can vary from as few as 
30 1 9, for example, to as many as 8 1 , for example, and larger. 

For a normal individual, n £ 36 in the (CAG)„ region, i.e., n = 2-36, 
and typically n= 19-36. This region in a normal allele of the SCA1 gene is 
optionally interrupted with CAT trinucleotides. Typically, there are no more than 
about 3 CAT trinucleotides, either individually or in combination, within any 
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(CAG)„ region. The (CAG) n region of this isolated sequence is unstable, i.e., highly 
variable within a population, and larger, i.e., expanded, in individuals who have 
symptoms of the disease, or who are likely to develop symptoms of the disease. For 
an affected individual, i.e., an individual with an affected allele of the SCA1 gene, 
5 n > 36 in the (CAG)„ region, and typically n > 43. One isolated DNA molecule of 
the SCA1 gene is about 3360 base pairs in length as shown in Figure 1. The 
sequences of a portion of the EcoRl fragment within the SCA1 gene of several 
affected individuals is shown in Figure 2. The entire 10,660 nucleotides of the 
SCA1 gene transcript are shown in Figure 15 (the entire SCA1 gene spans about 450 
10 kb of genomic DNA). 

The present invention is also directed to isolated oligonucleotides, 
particularly primers for use in PCR techniques and probes for diagnosing the 
neufddegenefative disorder SCA1. The oligonucleotides have at least about 11 
nucleotides and hybridize to a nucleic acid molecule containing a CAG repeat 
15 region of an isolated SCA1 gene. The hybridization can occur to any portion of a 
nucleic acid molecule containing a CAG repeat region of the SCA1 gene. 
Preferably, the oligonucleotides hybridize to a 3.36 kb EcoEl fragment of an SCA1 
gene having a CAG repeat region. Alternatively stated, each oligonucleotide is 
substantially complementary (having greater than 65% homology) to a nucleotide 
sequence having a CAG repeat region, i.e., a (CAG)„ region, preferably to a 3 36-kb 
EcoRI fragment.of the SCA1 gene. If the oligonucleotide is a primer the molecule 
preferably contains at least about 16 nucleotides and no more than about 35 
nucleotides. Furthermore, preferred primers are chosen such that they produce a 
primed product of about 70-350 base pairs, preferably about 100-300 base pairs 
More preferably, the primers are chosen such that nucleotide sequence is 
complementary to a portion of a strand of an affected or a normal allele within about 
150 nucleotides on either side of the (CAG) n region, including directly adjacent to 
the (CAG) n region. Most preferably, the primer is selected from the group 
consisting of CCGGAGCCCTGCTGAGGT (CAG-a), CCAGACGCCGGGACAC 
(CAG-b), AACTGGAAATGTGGACGTAC ( Rep n 

CAACATGGGCAGTCTGAG (Rep-2), CCACCACTCCATCCCAGC (GCT-435)' 
TGCTGGGCTGGTGGGGGG (GCT-214), CTCTCGGCTTTCTTGGTG (Pre 1)' 
and GTACGTCCACATTTCCAGTT (Pre-2). These primers substantially 
correspond to those shown in Figure 3. 
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They can be used in any combination for sequencing or producing 
amplified nucleic acid molecules, e.g., DNA molecules, using various PCR 
techniques. Preferably, for amplification of the DNA molecule characteristic of the 
SCA1 disorder, Rep-1 and Rep-2 is the primer pair used. As used herein, the term 
5 "amplified DNA molecule" refers to DNA molecules that are copies of a portion of 
DNA and its complementary sequence. The copies correspond in nucleotide 
sequence to the original DNA sequence and its complementary sequence. The term 
"complement", as used herein, refers to a DNA sequence that is complementary 
(having greater than 65% homology) to a specified DNA sequence. The term 
10 "primer pair", as used herein, means a set of primers including a 5' upstream primer 
that hybridizes with the 5' end of the DNA molecule to be amplified and a 3' 
downstream primer that hybridizes with the complement of the 3' end of the 
molecule to be amplified. 

Using the primers of the present invention, PCR technology can be 
15 used in the diagnosis of the neurological disorder SCA1 by detecting a region of 
greater than about 36 CAG repeating trinucleotides, preferably at least 43 repeating 
CAG trinucleotides. Generally, this involves treating separate complementary 
strands of the DNA molecule containing a region of repeating GAG codons with a 
molar excess of two oligonucleotide primers, extending the. primers to form 
complementary primer extension products which act as templates for synthesizing 
the desired molecule containing the CAG repeating units, and detecting the 
molecule so amplified. 

An oligonucleotide that can be used as a gene probe for identifying a 
nucleic acid molecule, e.g., a DNA molecule, containing a CAG repeat region of the 
SCA1 gene is also provided. The gene probe can be used for distinguishing 
between the normal and the larger affected alleles of the SCA1 gene. The gene 
probe can be a portion of a nucleotide sequence of the SCA1 gene itself (e.g., a 
3.36-kb EcoW fragment or portion thereof), complementary to it, or hybridizable'to 
it or the complement. It is of a size suitable for forming a stable duplex, i.e.. having 
at least about 1 1 nucleotides, preferably having at least about 15 nucleotides, more 
preferably having at least about 100 nucleotides (for effective Southern blotting), 
and most preferably having at least about 200 nucleotides. The probe can contain 
any portion of the (CAG) n region, although this is not a requirement. It is desirable 
however, for the probe to contain a portion of the nucleic acid molecule on either 
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side of the (CAG) n region. There is generally no maximum size limitation for such 
probes. In fact, the entire SCA1 gene could be a probe. 

The gene probe of the present invention is useable in a method of 
diagnosing a patient for SCA1. A particularly preferred method of diagnosis 
5 involves detecting the presence of a DNA molecule containing a CAG repeat region 
of the SCA1 gene. Specifically, the method includes the steps of digesting genomic 
DNA with a restriction endonuclease to obtain DNA fragments; preferably, 
separating the fragments by size using gel electrophoresis; probing said DNA 
fragments under hybridizing conditions with a detectably labeled gene probe that 
10 hybridizes to a nucleic acid molecule containing a CAG repeat region of an isolated 
SCA1 gene; detecting probe DNA which has hybridized to said DNA fragments; 
and analyzing the DNA fragments for a (CAG) n region characteristic of the normal 
or affected forms of the SCA1 gene. 

The present invention also provides a protein (or portions thereof) 
15 encoded by the SCA1 gene and antibodies (polyclonal or monoclonal) produced 
from the protein or portions thereof. The antibodies can be used in methods of 
isolating antigenic protein expressed by the SCA1 gene. For example, they can be 
added to a biological sample containing the antigenic protein to form an antibody- 
antigen complex, which can be isolated from the sample and exposed to amino acid 
20 sequencing of the antigenic protein. This can be done while the protein is still 
complexed with the antibody. 

Thus, the present invention provides methods to determine the 
presence or absence of an affected form of the SCA1 gene, which can be based on 
RNA- or DNA-based detection methods (preferably, the methods involve isolating 
25 and analyzing genomic DNA) or on protein-based detection methods. These 
methods include, for example, PCR-based methods, direct nucleic acid sequencing, 
measuring expression of the SCA1 gene by measuring the amount of mRNA 
expressed or by measuring the amount of ataxin-1 protein expressed. The methods 
of the present invention also include determining the size of the repeat region of the 
30 nucleic acid or amino acid molecules. 

As used herein, the term "isolated (and purified)" means that the 
nucleic acid molecule, gene, or oligonucleotide is essentially free from the 
remainder of the human genome and associated cellular or other impurities. This 
does not mean that the product has to have been extracted from the human genome; 
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rather, the product could be a synthetic or cloned product for example. As used 
herein, the term "nucleic acid molecule" means any single or double-stranded RNA 
or DNA molecule, such as mRNA, cDNA, and genomic DNA. 

As used herein, the term "SCA1 gene" means the 
5 deoxyribopolynucleotide located within the short arm of chromosome 6 between 
markers D6S89 and D6S274 of about 450 kb (10.5-1 1 kb transcript) containing an 
unstable CAG repeat region. This term, therefore, refers to numerous unique genes 
that are substantially the same except for the content of the CAG repeat region. A 
representative example of the SCA1 gene transcript for a normal individual is shown 

10 in Figure 15. Included within the scope of this term is any ribo- or deoxyribo- 
polynucleotide containing zero, one or more nucleotide substitutions that also 
encodes the protein ataxin-1. Included in the term "SCA-1 gene" is any 
polynucleotide as described in the previous sentence that has different numbers of 
CAG and/or CAT repeats in the polymorphic CAG repeat region. It is understood 

15 also that the term "SCA1 gene" includes both the polypeptide-encoding region and 
the regions that encode the 5' and 3' untranslated segments of the mRNA for SCA1 . 
Although the SCA1 gene described herein is described in terms of the human 
genome, it is envisioned that other mammals, e.g., mice, may also have a very 
similar gene containing a CAG repeat region that could be used to produce 

20 oligonucleotides, for example, that are useful in diagnosing the SCA1 disorder in 
humans. : 

As used herein, the term "ataxin-1" means the gene product of the 
SCA1 gene, i.e., protein encoded by the open reading frame of the SCA1 gene and 
any protein substantially equivalent thereto, including all proteins of different 

25 lengths (e.g., 20-90 kD, preferably 60-90 kD) encoded by said open reading frame 
which start at each in-frame ATG translation start site. The term "ataxin-1" further 
includes all proteins with essentially the same N-terminal and C-terminal sequences 
but different numbers of glutamine (Q) and/or histadine (H) repeats (primarily 
glutamine repeats) in the polymorphic repeat region. 

30 As used herein, the term "polymorphic CAG repeat region" or simply 

"CAG repeat region" means that region of the SCA1 gene that encodes a string of 
polyglutamate residues that varies in number from individual allele to individual 
allele, and which can range in number from 2 to 80 or more. Moreover, the 
polymorphic CAG repeat regions can contain CAT (encoding histidine) in place of 
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CAG, although CAT is much less common than CAG in this region. It is to be 
understood that when referring to nucleic acid molecules containing the CAG repeat 
region, this includes RNA molecules containing the corresponding GUC repeat 
region. 

5 As used herein, an "affected" gene refers to the allele of the SCA1 

gene that, when present in an individual, is the cause of spinocerebellar ataxia type 
1, and an "affected" individual has the symptoms of autosomal dominant 
spinocerebellar ataxia type 1. Individuals with only "normal" SCA1 genes, do not 
possess the symptoms of SCA1. The term "allele" means a genetic variation 

10 associated with a coding region; that is, an alternative form of the gene. 

As used herein, "hybridizes" means that the oligonucleotide forms a 
noncovalent interaction with the stringency target nucleic acid molecule under 
standard conditions. The hybridizing oligonucleotide may contain nonhybridizing 
nucleotides that do not interfere with forming the noncovalent interaction, e.g., a 

15 restriction enzyme recognition site to facilitate cloning. 

Brief Description of the Drawing s 
Figure 1. Sequence of the 3.36 kb EcoRl fragment of the normal 
SCA1 gene located within the short arm of chromosome 6. It is within this 
20 fragment that mutations occur in the CAG repeat region which are associated with 
autosomal dominant spinocerebellar ataxia type 1. 

Figure 2. Sequence information for five affected individuals in the 
CAG repeat region, i.e., the CAG trinucleotide repeat, and its flanking regions of the 
SCA1 gene located within a short arm of chromosome 6. 
25 Figure 3. Sequence of the CAG trinucleotide repeat and its flanking 

regions. About 500 nucleotides in a single strand of DNA of the 3.36 kb EcoRl 
fragment of the SCA1 gene shown in Figure 1 is represented. The locations of PCR 
primers are shown by solid lines with arrowheads. 

Figure 4. Summary of SCA1 recombination events that led to the 
30 precise mapping of the SCA1 locus. Recombinant disease-carrying chromosomes 
are shown for the markers shown above. A schematic diagram of the relevant 
region of 6p22 (not drawn to scale) is shown at the top of the figure. Families are 
coded as follows: TX = Houston, MN = Minnesota, MI = Michigan, IT = Italy. 
Each recombination event is given a number following the family code. 
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Figure 5- Regional localization of 6p22-p23 STSs by PCR analysis 
of radiation reduced hybrids. Three panels (a-c) demonstrate the regional 
localization of D6S274, D6S288, and AM10GA. In each panel PCR amplification 
results are shown for genomic DNA, the 1-7 cell line which retains 6p, the radiation 
5 reduced hybrids R17, R72, R86, and R54, and RJK88 hamster DNA. A blank 
control (c) is shown for every panel. R86 has been previously shown to retain 
D6S89; R17 and R72 are known to contain D6S88 and D6S108, two DNA markers 
which map centromeric to D6S89. An amplification product is seen in 1-7, R17, 
R72, and R86 for D6S274 and D6S288, whereas the amplification product for 

10 AM10GA is only seen in 1-7 and R86 confirming that D6S274 and D6S288 map 
centromeric to AM10GA and D6S89. 

Figure 6. A schematic dia gram of 6p22-p23 region showing the new 
markers and the YAC contig. At the bottom of the diagram, the radiation hybrid 
reduced panel used for regional mapping is shown. YAC clones are represented as 

15 dark lines, open segments indicate a noncontiguous region of DNA. The 
discontinuity shown in YAC clone 35 IB 10 indicate that this YAC has an internal 
deletion. All of the ends of the YAC clones that were isolated are designated by an 
"L" for the left end or an "R" for the right end. 

Figure 7. Genotypic data for 6p22-p23 dinucleotide repeat markers 

20 are shown for a reduced pedigree from the MN-SCA1 kindred. This figure 
summarizes a second recombination event that led to the precise mapping of the 
SCA1 locus. 

Figure 8- Long-range restriction maps of YACs, 227B1, 60H7, 
195B5, A250D5, and 379C2. YACs 351B10, 172B5, 172B5, and 168F1 were also 

25 used in the restriction analysis (data not shown). The restriction sites are marked as 
N, Noil; B, BssKlI; Nr, Nrul; M, MM, S, SacII, and Sa, SWL A summary map of 
the SCA1 gene region with the position of the DNA markers used as probes (boxes) 
is shown. The centromere-telomere orientation is indicated by cen/tel respectively. 

Figure 9. Physical map of the SCA1 region. The positions of 

30 various genetic markers and sequence tagged sites (STSs) relative to the overlapping 
YAC clones are shown. AM 10 and FLB1 are STSs developed using a radiation 
reduced hybrid retaining chromosome 6p22-p23, A205D5-L and 195B5-L are STSs 
from insert termini of YACs A250D5 and 195B5. D6S89, D6S109, D6S288 and 
D6S274, and AM10-GA are dinucleotide repeat markers used in the genetic analysis 
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of SCA1 families. The SCA1 candidate region is flanked by the D6S274 and 
D6S89 markers which identify the closest recombination events. The YAC clones 
shown here are indicated by the cross-hatched markings. YAC 172B5 has two non- 
contiguous segments of DNA as indicated by the open bar for the non-6p segment. 
5 The YACs are designated according to St. Louis and CEPH libraries. The position 
of the cosmid contig (C) which contains the overlapping cosmids which are (CAG)n 

positive is indicated by a solid black bar. The overlap between the YACs was 
determined by long-range restriction analysis. Orientation is indicated as 
centromeric (Cen) and telomeric (Tel), 

10 Figure 10. Southern blot analysis of leukocyte DNA using the 3.36- 

kb EcoRl fragment which contains the repeat as a probe. Figure 10a: Taql- 
digested DNA from a TX-SCA1 kindred. The unaffected spouse has a single 
fragment at 2830-bp. The affected individual with onset at 25 years of age has the 
2830-bp fragment as well as a 2930-bp fragment. The affected child with onset at 4 

15 years inherited the normal 2830-bp from her mother, and has a new fragment of 
3000-bp not seen in either parent. Figure 10b: Taql-digested DNA from 
individuals from a MN-SCA1 kindred. The unaffected spouse and the unaffected 
sibling have a 2830-bp fragment. The two affected brothers have the 2830-bp 
fragment as well as an expanded fragment of 2900-bp in the sib with onset at 25 

20 years and 2970-bp in the sib with onset at 9 years. Figure 10c: BstNI-digested 
DNA from the TX-SCA1 kindred. Lanes 1-3 are from the same kindred depicted in 
(A). The normal fragment size is 530-bp, in individuals with onset at 25-30 years 
(lanes 1 and 4) the fragment expands to 610-bp. In the individual with onset at 15 
years of age (lane 7) the fragment size is 640-bp, and in the individual with onset at 

25 4 years (lane 3) the fragment size is 680-bp. The DNA in lane 5 is from a 14 year 
old child who is asymptomatic. 

Figure 11. Analysis of the PCR-amplified products containing the 
trinucleotide repeat tract in normal and SCA1 individuals. The CAG-a/CAG-b 
primer pair was used in panel (a) whereas the Rep-l/Rep-2 primer pair was used in 

30 panel (b). The individuals in lanes 1, 2 and 3 in panel (a) are brothers. The range 
for the normal (NL) and expanded (EXP) (CAG) n repeat units is indicated. 

Figure 12. A scatter plot for the age-at-onset in years versus the 
number of the (CAG) n repeat units is shown to demonstrate the correlation between 

the age-at-onset and the size of the expansion. A linear correlation coefficient of 
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-0.845 was obtained. In addition a curvilinear correlation coefficient was calculated 
given the non-linear pattern of the plot. The curvilinear correlation coefficient is 
-0.936. 

Figure 13. Schematic representation of the SCA1 cDNA contig. A 
5 subset of overlapping phage cDNA clones (black bars) and 5'-RACE-PCR product 
(Rl) spanning 10.66 kb of the SCA1 transcript is shown. cDNA clone 31-5 
contains the entire coding region for the SCA1 gene product, ataxin-1. On top, a 
schematic shows the structure of the SCA1 transcript; the sizes of the coding region 
(rectangle) as well as the 5'UTR and the 3'UTR (thin lines) are indicated. The 
10 position of the CAG repeat within the coding region is also shown. An asterisk 
indicates the clones used as probes to screen the cDNA libraries. At the bottom the 
positions of BamHL (B), Hindlll (H), and Tagl (T) restriction sites are shown. 

Figure 14, Northern blot analysis of the SCA1 gene using RNAs 
from multiple human tissues. The panel on the left is probed with a PCR product 
15 from a portion of the coding region (bp 2460 to bp 3432). The panel on the right is 
hybridized with the 3 J cDNA clone from the 3'UTR. An —11 kb transcript is 
detected in RNAs from all tissues using both probes as well as the cDNA clones 
31-5 and 8-8, both of which contain the CAG repeat (Figure 13). 

Figure 15. The sequence of the SCA1 transcript. The sequences of 
20 primers 9b, 5F and 5R (bp 129-147, bp 173-191 and bp 538-518 respectively in the 
5' to 3' orientation) are underlined. The protein sequence encoded by the DNA is 
shown below the DNA sequence. The CAG repeat region is from about bp 1524 to 
about bp 1613. 

Figure 16. a. The structure of the SCA1 transcript and the various 
25 splice variants. The schematic on top represents the nine exons (not drawn to scale) 
and their respective sizes. The stippled areas indicate the coding region. The 
structure of five cDNA clones representing different splice variants of the SCA1 
transcript are also shown. Clones 8-8 and 8-9b are phage clones, RT-PCR1 and 
RT-PCR2 are two clones obtained by RT-PCR carried out on cerebellar poly-(A) + 
30 RNA using the primers 9b and 5R (Figure 15). Only 30 bp of exon 1 were present 
in clone 8-9b and RT-PCR products as indicated by the broken line in the 
rectangles, b. Detection of alternative splicing of the SCA1 transcript in cerebellar 
poly-(A) + RNA (CBL RNA). RT-PCR analysis was carried out using two sets of 
primers: 9b-5R and 5F-5R. PCR products of the expected size were detected in 
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CBL RNA in the presence of reverse transcriptase (+RT) with both pairs of primers^ 
Using the 9b-5R pair at least two larger PCR products were also detected. Using the 
5F-5R pair for RT-PCR at annealing T < 60°, some faint bands in the same Size 
range as those seen using the 9b-5R primer pair were also seen. 8-8 and 8-9b are the 
5 phage clones used as positive controls. The sizes of the relevant bands of the 
molecular weight marker (FX 174 cut with Haelll) are indicated on the left. 

Figure 17. Intron-exon boundaries of the SCA1 gene. Splice 
acceptor and splice donor sites are indicated in bold letters. The numbers at the 
beginning and the end of each exon refer to the position in the composite sequence 

10 of SCA1 in Figure 15. Uppercase letters indicate exon sequences, lowercase letters 
indicate intron sequences. Y= pyrimidine; R= purine; N= undefined. 

Figure 18. Genomic structure of the SCA1 gene. The nine exons of 
the SCA1 gene (solid rectangles not drawn to scale) were localized based on the 
restriction map of the SCA1 region by Southern analysis using rare cutter DNA 

15 digests from several YAC clones. A representative map using YAC clone 227B1, 
which encompasses the SCA1 gene, is shown. The restriction map of this YAC has 
been confirmed by analysis of four overlapping YAC clones in the region. The 
centromere-telomere orientation is indicated by CEN-TEL, respectively. L= left 
YAC end; R= right YAC end; B= BssHII; C= Cspl; M= Mul; N= NotI; Nr= Nrul; 

20 S=SaclI. 

Figure 19. Analysis of expression of the expanded SCA1 allele. 
RT-PCR was carried out on lymphoblast poly-(A) + RNA from one unaffected 
individual (lane 1) and four SCA1 patients (lanes 2 through 5) using primers Repl 
and Rep2. This analysis shows that both the normal and the expanded SCA1 alleles 
25 are transcribed. The number of the repeat units for each allele is indicated below 
each lane; lane 6 is the RT minus control. 

Figure 20. Distributions of CAG repeat lengths from unaffected 
control individuals and from SCA1 alleles. Normal alleles range in size from 19 to 
36 repeat units while disease alleles contain from 42 to 81 repeats. 

30 
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Retailed Description 

Substantial efforts have been made to localize the SCA1 gene using 
genetic and physical mapping methods. Genetically, SCA1 is flanked on the 
centromeric side by D6S88 at a recombination fraction of approximately 0.08 
5 (based on marker-marker distances using the Centre d'Etude du Polymorphisme 
Humain (CEPH) reference families) and on the telomeric side by F13A at a 
recombination fraction of 0.19. See, L.P.W. Ranum et al., Am. J. Hum. Genet. T 49 T 
31-41 (1991). Both markers are quite distant and are not practical for use in efforts 
aimed at cloning the SCA1 gene. The D6S89 marker maps closer to the SCA1 
10 gene. 

To localize SCA1 more precisely, five dinucleotide polymorphisms 
near D.6S.89 have been identified. A new marker. AM10GA. demonstrates no 
recombination with SCA1. Linkage analysis and analysis of recombination events 
confirm that SCA1 maps centromeric to D6S89 with D6S109 as the other flanking 
15 marker at the centromeric end and establishes the following order: centromere- 
D6SI09-AM10GA/SCAl-D6S89-LR40-D6S202-telomere. The genetic distance 
between the two flanking markers D6S109 and D6S89 is about 6.7 cM based on 
linkage analysis using 40 reference families from the Centre d'Etude du 
Polymorphisme Humain (CEPH). 

20 

As SCAJi Geyie and Method of Diagnosis 

The size of the candidate region on the short arm of chromosome 6 
containing the SCA1 locus is about 1.2 Mb, and is flanked by D6S274 to the 
centromeric side and D6S89 to the telomeric side. The SCA1 gene spans 450 kb of 

25 genomic DNA and is organized in nine exons (Figure 15 is representative of the 
SCA1 gene from a normal individual). The SCA1 transcript (i.e., mRNA or cDNA 
clone) is about 10.6-11 kb. The gene is transcribed in both normal and affected 
SCA1 alleles. The structure of the gene is unusual in that it contains seven exons in 
the 5'-untranslated region, two large exons (2080 bp and 7805 bp) which contain a 

30 2448-bp coding region, and a 7277 bp 3 '-untranslated region. The first four non- 
coding exons undergo extensive alternative splicing in several tissues. 

The gene for SCA1 contains a highly polymorphic CAG repeat that 
is located within a 3.36-kb fragment produced by digestion of the candidate region 
with the restriction enzyme, EcoRI. The CAG repeat region preferably lies within 
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ths coding region and codes for polyglutamine. This region of CAG repeating 
sequences is unstable and expanded in individuals with SCA1. Southern and PCR 
analyses of the (CAG) n repeat demonstrate a correlation between the size of the 
repeat expansion and the age-at-onset of SCA1 and severity of the disorder. That is, 
5 individuals with more repeat units (or longer repeat tracts) tend to have both an early 
age of onset and a more severe disease coarse. These results demonstrate that SCA1, 
like fragile X syndrome, myotonic dystrophy, X-linked spinobulbar muscular 
atrophy, and Huntington disease, displays a mutational mechanism involving 
expansion of an unstable trinucleotide repeat. 

10 The identification of a trinucleotide repeat expansion associated with 

SCA1 allows for improved diagnosis of the disease. Thus, in addition to being 
directed to the gene for SCA1 and the protein encoded thereby, the present invention 
also relates to methods of diagnosing SCA1. These diagnostic methods can involve 
any known method for detecting a specific fragment of DNA. These methods can 

15 include direct detection of the DNA or indirect through detection of RNA or 
proteins, for example. For example, Southern or Northern blotting hybridization 
techniques using labeled probes can be used. Alternatively, PCR techniques can be 
used with novel primers that amplify the CAG repeating region of the EcoRl 
fragment Nucleic acid sequencing can also be used as a direct method of 

20 determining the number of CAG repeats. 

For example, DNA probes can be used for identifying DNA 
segments of the affected allele of the SCA1 gene. DNA probes are segments of 
labeled, single-stranded DNA which will hybridize, or noncovalently bind, with 
complementary single-stranded DNA derived from the gene sought to be identified. 

25 The probe can be labeled with any suitable label known to those skilled in the art, 
including radioactive and nonradioactive labels. Typical radioactive labels include 
32 P, 125 I, 35 S, and the like. Nonradioactive labels include, for example, ligands such 
as biotin or digoxigenin as well as enzymes such as phosphatase or peroxidases, or 
the various chemiluminescers such as luciferin, or fluorescent compounds like 

30 fluorescein and its derivatives. The probe may also be labeled at both ends with 
different types of labels for ease of separation, as, for example, by using an isotopic 
label at one end and a biotin label at the other end. 

Using DNA probe analysis, the target DNA can be derived by the 
enzymatic digestion, fractionation, and denaturation of genomic DNA to yield a 
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complex mixture incorporating the DNA from many different genes, including DNA 
from the short arm of chromosome 6, which includes the SCA1 locus. A specific 
DNA gene probe will hybridize only with DNA derived from its target gene or gene 
fragment, and the resultant complex can be isolated and identified by techniques 
5 known in the art. 

In general, for detecting the presence of a DNA sequence located 
within the SCA1 gene, the genomic DNA is digested with a restriction endonuclease 
to obtain DNA fragments. The source of genomic DNA to be tested can be any 
biological specimen that contains DNA. Examples include specimen of blood, 

10 semen, vaginal swabs, tissue, hair, and body fluids. The restriction endonuclease 
can be any that will cut the genomic DNA into fragments of double-stranded DNA 
having a particular nucleotide sequence. The specificities of numerous 
endonucleases are well known and can be found in a variety of publications, e.g. 
Maniatis et al.; Molecular Cloning: A Laboratory Manual: Cold Spring Harbor 

15 Laboratory: New York (1982). That manual is incorporated herein by reference in 
its entirety. Preferred restriction endonuclease enzymes include EcdKl, TaqI, and 
BsfNl. EcoRI is particularly preferred. 

Diagnosis of the disease can alternatively involve the use of the 
polymerase chain reaction sequence amplification method (PGR) using novel 

20 primers. U.S. Patent No. 4,683,195 (Mullis et al., issued July 28, 1987) describes a 
process for amplifying, detecting and/or cloning nucleic acid sequences. The 
method involves treating extracted DNA to form single-stranded complementary 
strands, treating the separate complementary strands of DNA with two 
oligonucleotide primers, extending the primers to form complementary extension 

25 products that act as templates for synthesizing the desired nucleic acid molecule; 
and detecting the amplified molecule. More specifically, the method steps of 
treating the DNA with primers and extending the primers include the steps of: 
adding a pair of oligonucleotide primers, wherein one primer of the pair is 
substantially complementary to part of the sequence in the sense strand and the other 

30 primer of each pair is substantially complementary to a different part of the same 
sequence in the complementary antisense strand; annealing the paired primers to the 
complementary molecule; simultaneously extending the annealed primers from a 3' 
terminus of each primer to synthesize an extension product complementary to the 
strands annealed to each primer wherein said extension products after separation 



WO 95/01437 



PCT7US94/07336 



-15- 

from the complement serve as templates for the synthesis of an extension product 
for the other primer of each pair; and separating said extension products from said 
templates to produce single-stranded molecules. Variations of the method are 
described in U.S. Patent No. 4,683,194 (Saiki et al., issued July 28, 1987). The 
5 polymerase chain reaction sequence amplification method is also described by Saiki 
et al., Science , 230 . 1350-1354 (1985) and Scharf et al., Science, 224, 163-166 
(1986). The discussion of the these techniques in each of these references is 
incorporated herein by reference. 

The primers are oligonucleotides, either synthetic or naturally 

10 occurring, capable of acting as a point of initiating synthesis of a product 
complementary to the region of the DNA sequence containing the CAG repeating 
trinucleotides of the SCA1 locus of the short arm of cluomosome 6. The primer 
includes a nucleotide sequence substantially complementary to a portion . of a strand 
of an affected or a normal allele of a fragment (preferably a 3.36 kb EcoKl 

15 fragment) of an SCA1 gene having a (CAG) n region. The primer sequence has at 
least about 1 1 nucleotides, preferably at least about 16 nucleotides and no more than 
about 35 nucleotides. The primers are chosen such that they produce a primed 
product of about 70-350 base pairs, preferably about 100-300 base pairs. More 
preferably, the primers are chosen such that nucleotide sequence is substantially 

20 complementary to a portion of a strand of an affected or a normal allele within about 
1 50 nucleotides on either side of the (CAG) n region, including directly adjacent to 
the (CAG) n region. 

Examples of preferred primers are shown by solid lines with 
arrowheads in Figure 3. The primers are thus selected from the group consisting of 

25 CCGGAGCCCTGCTGAGGT (CAG-a), CCAGACGCCGGGACAC (CAG-b), 
AACTGGAAATGTGGACGTAC (Rep-1), CAACATGGGCAGTCTGAG (Rep-2), 
CCACCACTCCATCCCAGC (GCT-435), TGCTGGGCTGGTGGGGGG (GCT- 
214), CTCTCGGCTTTCTTGGTG (Pre-1), and GTACGTCCACATTTCCAGTT 
(Pre-2). These primers can be used in various combinations or with any other 

30 primer that can be designed to hybridize to a portion of DNA of a fragment 
(preferably a 3.36 kb EcoKL fragment) of an SCA1 gene having a CAG repeat 
region. For example, the primer labeled Rep-2 can be combined with the primer 
labeled CAG-a, and the primer labeled CAG-b can be combined with the primer 
labeled Rep-1 . More preferably the primers are the sets of primer pairs designed as 
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CAG-a/CAG-b, Rep-l/Rep-2, Rep-l/GCT-435, for example. These primer sets 
successfully amplify the CAG repeat units of interest using PCR technology. 
Alternatively, they can be used in various known techniques to sequence the SCA1 
gene. 

5 As stated previously, other methods of diagnosis can be used as well. 

They can be based on the isolation and identification of the repeat region of genomic 
DNA (CAG repeat region), cDNA (CAG repeat region), mRNA (GUC repeat 
region), and protein products (glutamine repeat region). These include, for example, 
using a variety of electrophoresis techniques to detect slight changes in the 

10 nucleotide sequence of the SCA1 gene. Further nonlimiting examples include 
denaturing gradient electrophoresis, single strand conformational polymorphism 
gels, and nondenaturing gel electrophoresis techniques. 

The mapping and cloning of the SCA1 gene allows the definitive 
diagnosis of one type of the dominantly inherited ataxias using a simple blood test. 

15 This represents the first step towards an unequivocal molecular classification of the 
dominant ataxias. A simple and reliable classification system for the ataxias is 
important because the clinical symptoms overlap extensively between the SCA1 and 
the non-SCAl forms of the disease. Furthermore, a molecular test for the only 
known SCA1 mutation permits presymptomatic diagnosis of disease in known 

20 SCA1 families and allows for the identification of sporadic or isolated CAG repeat 
expansions where there is no family history of the disease. Thus, the present 
invention can be used in family counseling, planning medical treatment, and in 
standard work-ups of patients with ataxia of unknown etiology. 

25 & Cloning 

Cloning of SCA1 DNA into the appropriate replicable vectors allows 
expression of the gene product, ataxin-1, and makes the SCA1 gene available for 
further genetic engineering. Expression of ataxin-1 or portions thereof, is useful 
because these gene products can be used as antigens to produce antibodies, as 

30 described in more detail below. 

1. Isolatio n of DNA 

DNA containing the SCA1 gene may be obtained from any cDNA 
library prepared from tissue believed to possess the SCA1 mRNA and to express it 
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at a detectable level. Preferably, the cDNA library is from human fetal brain or 
adult cerebellum. Optionally, the SCA1 gene may be obtained from a genomic 
DNA library or by in vitro oligonucleotide synthesis from the complete nucleotide 
or amino acid sequence. 
5 Libraries are screened with appropriate probes designed to identify 

the gene of interest or the protein encoded by it. Preferably, for cDNA libraries, 
suitable probes include oligonucleotides that consist of known or suspected portions 
of the SCA1 cDNA from the same or different species; and/or complementary or 
homologous cDNAs or fragments thereof that consist of the same or a similar gene. 
10 Optionally, for cDNA expression libraries (which express the protein), suitable 
probes include monoclonal or polyclonal antibodies that recognize and specifically 
bind .to the SCA1 gene product ataxin-1 Appropriate probes for screening 
genomic DNA libraries include, but are not limited to, oligonucleotides, cDNAs, or 
fragments thereof that consist of the same or a similar gene, and/or homologous 
15 genomic DNAs or fragments thereof. Screening the cDNA or genomic library with 
the selected probe may be accomplished using standard procedures. 

Screening cDNA libraries using synthetic oligonucleotides as probes 
is a preferred method of practicing this invention. The oligonucleotide sequences 
selected as probes should be of sufficient length and sufficiently unambiguous to 
20 minimize false positives. The actual nucleotide sequence(s) of the probe(s) is 
usually designed based on regions of the SCA1 gene that have the least codon 
redundancy. The oligonucleotides may be degenerate at one or more positions, i.e., 
two or more different nucleotides may be incorporated into an oligonucleotide at a 
given position, resulting in multiple synthetic oligonucleotides. The use of 
25 degenerate oligonucleotides is of particular importance where a library is screened 
from a species in which preferential codon usage is not known. 

The oligonucleotide can be labeled such that it can be detected upon 
hybridization to DNA in the library being screened. A preferred method of labeling 
is to use ATP and polynucleotide kinase to radiolabel the 5' end of the 
30 oligonucleotide. However, other methods may be used to label the oligonucleotide, 
including, but not limited to, biotinylation or enzyme labeling. 

Of particular interest is the SCA1 nucleic acid that encodes a full- 
length mRNA transcript, including the complete coding region for the gene product, 
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ataxin-1. Nucleic acid containing the complete coding region can be obtained by 
screening selected cDNA libraries using the deduced amino acid sequence. 

An alternative means to isolate the SCA1 gene is to use PCR 
methodology. This method requires the use of oligonucleotide primer probes that 
5 will hybridize to the SCA1 gene. Strategies for selection of PCR primer 
oligonucleotides are described below. 

2. Insertion ofDNA into Vector 

The nucleic acid (e.g., cDNA or genomic DNA) containing the SCA1 

10 gene is preferably inserted into a replicable vector for further cloning (amplification 
of the DNA) or for expression of the gene product, ataxin-1. Many vectors are 
available, and selection of the appropriate vector will depend on: 1) whether it is to 
be used for DNA amplification or for DNA expression; 2) the size of the nucleic 
acid to be inserted into the vector; and 3) the host cell to be transformed with the 

15 vector. Most expression vectors are "shuttle" vectors, i.e., they are capable of 
replication in at least one class of organism but can be transfected into another 
organism for expression. For example, a vector is cloned in E. coli and then the 
same vector is transfected into yeast or mammalian cells for expression even though 
it is not capable of replicating independently of the host cell chromosome. Each 

20 replicable vector contains various structural components depending on its function 
(amplification of DNA or expression of DNA) and the host cell with which it is 
compatible. These components are described in detail below. 

Construction of suitable vectors employs standard ligation techniques 
known in the art. Isolated plasmids or DNA fragments are cleaved, tailored, and 

25 relegated in the form desired to generate the plasmids required. Typically, the 
ligation mixtures are used to transform E. coli K12 strain 294 (ATCC 31,446) and 
successful transformants are selected by ampicillin or tetracycline resistance where 
appropriate. Plasmids from the transformants are prepared, analyzed by restriction 
endonuclease digestion, and/or sequenced by methods known in the art. See, e.g., 

30 Messing et al., Nucl. Acids Res.. % 309 (1981) and Maxam et aL, Methods in 
gnzymo logy, £5, 499 (1980). 

Optionally, DNA may also be amplified by direct insertion into the 
host genome. This is readily accomplished using Bacillus species as hosts, for 
example, by including in the vector a DNA sequence that is complementary to a 
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sequence found in Bacillus genomic DNA. Transfection of Bacillus with this vector 
results in homologous recombination with the genome and insertion of SCA1 DNA. 
However, the recovery of genomic DNA containing the SCA1 gene is more 
complex than that of an exogenously replicated vector because restriction enzyme 
5 digestion is required to excise the SCA1 DNA. 

Replicable cloning and expression vector components generally 
include, but are not limited to, one or more of the following: a signal sequence, an 
origin of replication, one or more marker genes, an enhancer element, a promoter 
and a transcription termination sequence. 
10 Vector component: signal sequence. A signal sequence may be used 

to facilitate extracellular transport of a cloned protein. To this end, the SCA1 gene 
product ataxin-l 7 may be expressed not only directly, but also as a fusion product 
with a heterologous polypeptide, preferably a signal sequence or other polypeptide 
having a specific cleavage site at the N-terminus of the cloned protein or 
15 polypeptide. The sig n al sequence may be a component of the vector, or it may be a 
part of the SCA1 DNA that is inserted into the vector. The heterologous signal 
sequence selected should be one that is recognized and processed (i.e., cleaved by a 
signal peptidase) by the host cell. For prokaryotic host cells, a prokaryotic signal 
sequence may be selected, for example, from the group of the alkaline phosphatase, 
20 penicillinase, lpp or heat-stable intertoxin II leaders. For yeast secretion the signal 
sequence used may be, for example, the yeast invertase, alpha factor, or acid 
phosphatase leaders. In mammalian cell expression, a native signal sequence may 
be satisfactory, although other mammalian signal sequences may be suitable, such 
as signal sequences from secreted polypeptides of the same or related species, as 
25 well as viral secretory leaders, for example, the herpes simplex gD signal. 

Vector component: origin of replication. Both expression and 
cloning vectors contain a nucleic acid sequence that enables the vector to replicate in 
one or more selected host cells. Generally, in cloning vectors this sequence is one 
that enables the vector to replicate independently of the host chromosomal DNA, 
30 and includes origins of replication or autonomously replicating sequences. Such 
sequences are well known for a variety of bacteria, yeast and viruses. The origin of 
replication from the plasmid pBR322 is suitable for most Gram-negative bacteria, 
the 2m plasmid origin is suitable for yeast, and various viral origins (SV40, 
polyoma, adenovirus, VSV or BPV) are useful for cloning vectors in mammalian 
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cells. Generally, the origin of replication component is not needed for mammalian 
expression vectors (the SV40 origin may typically be used only because it contains 
the early promoter). 

Vector component: marker gene. Expression and cloning vectors 
5 may contain a marker gene, also termed a selection gene or selectable marker. This 
gene encodes a protein necessary for the survival or growth of transformed host cells 
grown in a selective culture medium. Host cells not transformed with the vector 
containing the selection gene will not survive in the culture medium. Typical 
selection genes encode proteins that: (a) confer resistance to antibiotics or other 
10 toxins, e.g., ampicillin, neomycin, methotrexate, streptomycin or tetracycline; (b) 
complement auxotrophic deficiencies; or (c) supply critical nutrients not available 
from complex media, e.g., the gene encoding D-alanine racemase for Bacilli. One 
example of a selection scheme utilizes a drug to arrest growth of a host cell. Those 
cells that are successfully transformed with a heterologous gene express a protein 
15 conferring drug resistance and thus survive the selection regimen. 

An example of suitable selectable markers for mammalian cells are 
those that enable the identification of cells competent to take up the SCA1 nucleic 
acid, such as dihydrofolate reductase (DHFR) or thymidine kinase. The mammalian 
cell transformants are placed under selection pressure that only transformants are 
20 uniquely adapted to survive by virtue of having taken up the marker. For example, 
cells transformed with the DHFR selection gene are first identified by culturing all 
the transformants in a culture medium that contains methotrexate, a competitive 
antagonist for DHFR. An appropriate host cell when wild-type DHFR is employed 
is the Chinese hamster ovary (CHO) cell line deficient in DHFR activity, prepared 
25 and propagated as described by Urlaub et al., Proc. Natl. Acad. Sci. IJSA n 22, 42 1 6 
(1980). The transformed cells are then exposed to increased levels of methotrexate. 
This leads to the synthesis of multiple copies of the DHFR gene, and, 
concomitantly, multiple copies of the other DNA comprising the expression vectors, 
such as the SCA1 gene. This amplification technique can be used with any 
30 otherwise suitable host, e.g., ATCC No. CCL61 CHO-K1, notwithstanding the 
presence of endogenous DHFR if, for example, a mutant DHFR gene that is highly 
resistant to methotrexate is employed. Alternatively, host cells (particularly wild- 
type hosts that contain endogenous DHFR) transformed or co-transformed with 
SCA1 DNA, wild-type DHFR protein, and another selectable marker such as 
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aminoglycoside 3' phosphotransferase (APH) can be selected by cell growth in a 
medium containing a selection agent for the selectable marker such as an 
aminoglycoside antibiotic, e.g., kanamycin or neomycin. A suitable selection gene 
for use in yeast is the trpl gene present in the yeast plasmid YRp7 (Stinchcomb et 
5 aL, Nature . 282 . 39 (1979); Kingsman et al., Qene, 2, 141 (1979); or Tschemper et 
al., Gene . JO, 157 (1980)). The trpl gene provides a selection marker for a mutant 
strain of yeast lacking the ability to grow in tryptophan, for example, ATCC NO. 
44076 or PEP4-1 (Jones, Genetics . j£, 12 (1977)). The presence of the trpl lesion 
in the yeast host cell genome then provides an effective environment for detecting 

10 transformation by growth in the absence of tryptophan. Similarly, Leul deficient 
yeast strains (ATCC 20,622 or 38,626) are complemented by known plasmids 
bearing the Leul gene. 

Vector component: promoter. Expression and cloning vectors 
usually contain a promoter that is recognized by the host organism and is operably 

15 linked to the SCA1 nucleic acid. Promoters are untranslated sequences located 
upstream (5') to the start codon of a structural gene (generally within about 100 to 
1000 bp) that control the transcription and translation of a particular nucleic acid 
sequence, such as the ataxin-1 nucleic acid sequence, to which they are operably 
linked. Such promoters typically fall into two classes, inducible and constitutive. 

20 Inducible promoters are promoters that initiate increased levels of transcription from 
DNA under their control in response to some change in culture conditions, e.g., the 
presence or absence of a nutrient or a change in temperature. In contrast, 
constitutive promoters produce a constant level of transcription of the cloned DNA 
segment. 

25 At this time a large number of promoters recognized by a variety of 

potential host cells are well known in the art. Promoters are removed from their 
source DNA using a restriction enzyme digestion and inserted into the cloning 
vector using standard molecular biology techniques. Both the native SCA1 
promoter sequence and many heterologous promoters can be used to direct 

30 amplification and/or expression of the SCA1 DNA. Heterologous promoters are 
preferred, as they generally permit greater transcription and higher yields of 
expressed protein as compared to the native promoter. Well-known promoters 
suitable for use with prokaryotic hosts include the beta-lactamase and lactose 
promoter systems, alkaline phosphatase, a tryptophan (trp) promoter system, and 
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hybrid promoters such as the tac promoter. Such promoters can be ligated to SCA1 
DNA using linkers or adapters to supply any required restriction sites. Promoters 
for use in bacterial systems may contain a Shine-Dalgarno sequence for RNA 
polymerase binding. 

5 Promoter sequences are known for eukaryotes. Virtually all 

eukaryotic genes have an AT-rich region located approximately 25 to 30 bp 
upstream from the site where transcription is initiated Another sequence found 70 
to 80 bases upstream from the start of transcription of many genes is the CXCAAT 
region where X may be any nucleotide. At the Y end of most eukaryotic genes is an 
10 AATAAA sequence that may be a signal for addition of the poly A tail to the 3' end 
of the coding sequence. All these sequences are suitably inserted into eukaryotic 
expression vectors. Examples of suitable promoting sequences for use with yeast 
hosts include the promoters for 3-phosphoglycerate kinase or other glycolytic 
enzymes, such as enolase, glyceraldehyde-3 -phosphate dehydrogenase, hexokinase, 
15 pyruvate decarboxylase, phosphofructokinase, gIucose-6-phosphate isomerase, 3- 
phosphoglycerate mutase, pyruvate kinase, triosephosphate isomerase, 
phosphoglucose isomerase and glucokinase. Other yeast promoters, which are 
inducible promoters having the additional advantage of transcription controlled by 
growth conditions, are the promoter regions for alcohol dehydrogenase 2, 

20 isocytochrome C, acid phosphatase, degradative enzymes associated with nitrogen 
metabolism, metallothionein, glyceraldehyde-3 -phosphate dehydrogenase, and 
enzymes responsible for maltose and galactose utilization. 

SCA1 transcription from vectors in mammalian host cells can be 
controlled, for example, by promoters obtained from the genomes of viruses such as 

25 polyoma virus, fowlpox virus, adenovirus (such as Adenovirus 2), bovine papilloma 
virus, avian sarcoma virus, cytomegalovirus, a retrovirus, Hepatitis-B virus and 
most preferably Simian Virus 40 (SV40) (Fiers et al., Nature . 221, 113 (1978); 
Mulligan et al., Science, 202, 1422-1427 (1980); Pavlakis et al., Proc. Natl A^H , 
Sti. USA, 2S, 7398-7402 (1981)). Heterologous mammalian promoters (e.g., the 

30 actin promoter or an immunoglobulin promoter) and heat-shock promoters can also 
be used, as can the promoter normally associated with the SCA1 sequence itself, 
provided such promoters are compatible with the host cell systems. 

Vector component: enhancer element. Transcription of SCA1 DNA 
by higher eukaryotes can be increased by inserting an enhancer sequence into the 
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vector. Enhancers are c/5-acting elements of DNA, usually having about 10 to 300 
bp, that act on a promoter to increase its transcription. Enhancers are relatively 
orientation- and position-independent, having been found 5* and 3* to the 
transcription unit, within an intron as well as within the coding sequence itself. 
5 Many enhancer sequences are now known from mammalian genes (globin, elastase, 
albumin, alpha-fetoprotein, and insulin). Typically, however, an enhancer from a 
eukaryotic cell virus will be used. Examples include the SV40 enhancer on the late 
side of the replication origin, the cytomegalovirus early promoter enhancer, the 
polyoma enhancer on the late side of the replication origin, and adenovirus 
10 enhancers. The enhancer may be spliced into the vector at a position 5' or 3' to the 
SCA1 gene, but is preferably located at a site 5* of the promoter. 

Vector component: transcription termination. Expression vectors 
used in eukaryotic host cells (yeast, fungi, insect, plant, animal, human or nucleated 
cells from other multicellular organisms) can also contain sequences necessary for 
15 the termination of transcription and for stabilizing the mRNA. Such sequences are 
commonly available from the 5' and, occasionally, 3' untranslated regions of 
eukaryotic or viral DNAs or cDNAs. These regions can contain nucleotide 
segments transcribed as polyadenylated fragments in the untranslated portion of 
mRNA encoding ataxin- 1 . 
20 Preferably, the pMAL™-*2 vectors (New England Biolabs, Beverly, 

MA) are used to create the expression vector. These vectors provide a convenient 
method for expressing and purifying ataxin- 1 produced from the cloned SCA1 gene. 
The SCA1 gene is inserted downstream from the malE gene of E. colU which 
encodes maltose-binding protein (MBP) resulting in the expression of an MBP 
25 fusion protein. The method uses the strong 4t tac" promoter and the malE translation 
initiation signals to give high-level expression of the cloned sequences, and a one- 
step purification of the fusion protein using MBP's affinity for maltose. The vectors 
express the malE gene (with or without its signal sequence) fused to the lacZa gene. 
Restriction sites between malE and lacZa are available for inserting the coding 
30 sequence of interest. Insertion inactivates the P-galactosidase a-fragment activity of 
the malE-lacZa fusion, which results in a blue to white color change on Xgal plates 
when the construction is transformed into an a-complementing host such as TBI 
(T.C. Johnston et al., J. Biol. Chem. . 261, 4805-48 1 1 (1 986)) or JM107 (C. Yanisch- 
Perron et al., Gene . 22, 103-1 19 (1985)). When present, the signal peptide on pre- 
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MBP directs fusion proteins to the periplasm. For fusion proteins that can be 
successfully exported, this allows folding and disulfide bond formation to take place 
in the periplasm of E. coli, as well as allowing purification of the protein from the 
periplasm. The vectors carry the lac^ gene, which codes for the Lac repressor 
5 protein. This keeps expression from P] ac low in the absence of isopropyl P-D- 
thiogalactopyranoside (IPTG) induction. The pMAL™-2 vectors also contain the 
sequence coding for the recognition site of the specific protease factor Xa, located 
just 5 s to the polylinker insertion sites. This allows MBP to be cleaved from ataxin- 
1 after purification. Factor Xa cleaves after its four amino acid recognition 
10 sequence, so that few or no vector derived residues are attached to the protein of 
interest, depending on the site used for cloning. 

A .1 r»o i.i o^-fVtl .o**4» .DV.f\rap(>mn .i;o/?iArf _+ V» «->+ .i>roi rt *A -fV>*« *<kr>« 4 

2~~LJ.L>VS UOV1U1 U1W W/VplVJaiUll V iJ UJUL piu V1UV 1U1 UOlldlUU 

expression in mammalian cells of SCA1 DNA. In general, transient expression 
involves the use of an expression vector that is able to replicate efficiently in a host 

15 cell, such that the host cell accumulates many copies of the expression vector and, in 
turn, synthesizes high levels of a desired polypeptide encoded by the expression 
vector. Transient expression systems, comprising a suitable expression vector and a 
host cell, allow for the convenient positive identification of polypeptides encoded by 
cloned DNAs, as well as for the rapid screening of such polypeptides for desired 

20 biological or physiological properties. Thus, transient expression systems are 
particularly useful in the invention for purposes of identifying analogs and variants 
of ataxin-1 that have wild-type or variant biological activity. 

3, HQSt Cells 

25 Suitable host cells for cloning or expressing the vectors herein are the 

prokaryote, yeast, or higher eukaryotic cells described above. Suitable prokaryotes 
include eubacteria, such as Gram-negative or Gram-positive organisms, for 
example, E. coli, Bacilli such as B. subtilis, Pseudomonas species such as P. 
aeruginosa, Salmonella typhimurium, or Serratia marcsecans. One preferred E. coli 

30 cloning host is E. coli 294 (ATCC 31,446), although other strains such as E. coli B, 
E. coli XI 776 (ATCC 31,537) ? and E. coli W3110 (ATCC 27,325) are suitable. 
These examples are illustrative rather than limiting. Preferably the host cell should 
secrete minimal amounts of proteolytic enzymes. Alternatively, in vitro methods of 
cloning, e.g., PCR or other nucleic acid polymerase reactions, are suitable. 
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In addition to prokaryotes, eukaryotic microbes such as filamentous 
fungi or yeast are suitable hosts for SCA1 -encoding vectors. Saccaromyces 
cerevisiae, or common baker's yeast, is the most commonly used among lower 
eukaryotic host microorganisms. However, a number of other genera, species, and 
5 strains are commonly available and useful herein, such as Schizosaccaromyces 
pombe, Kluyveromyces hosts such as, e.g., K. lactis, K. fragilis, K. bulgaricus, K. 
thermotolerans, and K marxianus, yarrowia, Pichia pastoris, Candida, 
Trichoderma reesia, Neurospora crassa, and filamentous fungi such as, e.g., 
Neurospora, Penicittium, Tolypocladium, and Aspergillus hosts such as A. nidulans. 
10 Suitable host cells for the expression of glycosylated ataxin-1 are 

derived from multicellular organisms. Such host cells are capable of complex 

culture is workable, whether from vertebrate or invertebrate culture. Examples of 
invertebrate cells include plant and insect cells. Numerous baculoviral strains and 

15 variants and corresponding permissive insect host cells from hosts such as 
Spodoptera frugiperda (caterpillar), Aedes aegypti (mosquito), Aedes albopictus 
(mosquito), Drosophila melanogaster (fruitfly), and Bombyx mori have been 
identified. See, e.g., Luckow et al., Bio/Technolog y. £, 47-55 (1988); Miller et al., 
Genetic Engineering . &, 277-279 (1986); and Maeda et al., Nature . 592-594 

20 (1985). A variety of viral strains for transfection are publicly available, e.g., the L-l 
variant of Autographa californica NPV and the Bm-5 strain of Bombyx mori NPV, 
and such viruses may be used as the virus herein according to the present invention, 
particularly for transfection of Spodoptera frugiperda cells. 

Plant cell cultures of cotton, com, potato, soybean, petunia, tomato, 

25 and tobacco can be utilized as hosts. Typically, plant cells are transfected by 
incubation with certain strains of the bacterium Agrobacterium tumefaciens, which 
has been previously manipulated to contain the SCA1 DNA. During incubation of 
the plant cell culture with A. tumefaciens, the SCA1 DNA is transferred to the plant 
cell host such that it is transfected, and will, under appropriate conditions, express 

30 the SCA1 DNA. In addition, regulatory and signal sequences compatible with plant 
cells are available, such as the nopaline synthase promoter and polyadenylation 
signal sequences. Depicker et al., J. Mol. Appl. Gen. . 1, 561 (1982). 

Vertebrate cells can also be used as hosts. Propagation of vertebrate 
cells in culture (tissue culture) has become a routine procedure in recent years. 
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Examples of useful mammalian host cell lines are monkey kidney CV1 line 
. transformed by SV40 (CAS-7, ATCC CRL 1651); human embryonic kidney line 
(293 or 293 cells subcloned for growth in suspension culture, Graham et ah, J. Gen. 
Virol. . 26, 59 (1977)); baby hamster kidney cells (BHK, ATCC CCL 10); Chinese 
5 hamster ovary cellsADHFR (CHO, Urlaub and Chasin, Proc. Natl. Acad. Sci. USA . 
TL 4216 (1980)); mouse Sertoli cells (TM4, Mather, Biol Reprod. . 22, 243-251 
(1980)); monkey kidney cells (CV1 ATCC CCL 70); African green monkey kidney 
cells (VERO-76, ATCC CRL-1587); human cervical carcinoma cells (HELA, 
ATCC CCL 2); canine kidney cells (MDCK, ATCC CCL 34); buffalo rat liver cells 
10 (BRL 3A, ATCC CRL 1442); human lung cells (W138, ATCC CCL 75); human 
liver cells (Hep G2, HB 8065); mouse mammary tumor (MMT 060562, ATCC CCL 
51); TRI cells (Mather et aL, Annals N^Y, Acad. Sci.. !£1, 44-68 (1982)); MRC 5 
cells; FS4 cells; and a human hepatoma line (Hep G2). 

15 4. Transfec tion and transformation 

Host cells are transfected and preferably transformed with the above- 
described expression or cloning vectors of this invention and cultured in 
conventional nutrient media modified as appropriate for inducing promoters, 
selecting transformants, or amplifying the genes encoding the desired sequences. 

20 Transfection refers to the taking up of an expression vector by a host 

cell whether or not any coding sequence are in fact expressed. Numerous methods 
of transfection are known to the ordinarily skilled artisan, for example, the calcium 
phosphate precipitation method and electroporation are commonly used. Successful 
transfection is generally recognized when any indication of the operation of the 

25 vector occurs within the host cell. 

Transformation means introducing DNA into an organism so that the 
DNA is replicable, either as an extrachromosomal element or by chromosomal 
integrant. Depending on the host cell used, transformation is done using standard 
techniques appropriate to such cells. Calcium chloride is generally used for 

30 prokaryotes or other cells that contain substantial cell-wall barriers. Infection with 
Agrobacterium tumefaciens can be used for transformation of certain plant cells. 
For mammalian cells without cell walls, the calcium phosphate precipitation method 
of Graham et aL, Virology . 52, 456-457 (1978) is preferred. Transformations into 
yeast are typically carried out according to the method of Van Solingen et aL, JL 
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Bact. . HQ, 946 (1977) and Hsiao et al., Proc. Natl. Acad. Set. fIJSAY 28 3829 
(1979). However, other methods for introducing DNA into cells such as by nuclear 
injection, electroporation, or protoplast fusion may also be used. 

5 5. Cell Culture 

Prokaryotic cells used to produce the SCA1 gene product, ataxin-1, 
are cultured in suitable media, as described generally in Sambrook et al. The 
mammalian host cells used to produce the SCA1 gene product may be cultured in a 
variety of media. Commercially available media such as Hams F10 (Sigma), 
10 Minimal Essential Medium (MEM, Sigma), RPMI-1640 (Sigma), and Dulbecco's 
Modified Eagle's Medium (DMEM, Sigma) are suitable for culturing the host cells. 
These media may be supplemented as necessary with hormones and/or other growth 
factors (such as insulin, transferrin, or epidermal growth factor), salts (such as 
sodium chloride, calcium, magnesium, and phosphate), buffers (such as HEPES), 
15 nucleosides (such as adenosine and thymidine), antibiotics (such as Gentamycin™ 
drug), trace elements (defined as inorganic compounds usually present at final 
concentrations in the micromolar range), and glucose or an equivalent energy 
source. Any other necessary supplements may also be included at appropriate 
concentrations that would be known to those skilled in the art. The culture 
20 conditions, such as temperature, pH, and the like, are those previously used with the 
host cell selected for expression, and will be apparent to the ordinarily skilled 
artisan. The host cells referred to in this disclosure encompass in in vitro culture as 
well as cells that are within a host animal. 

€L Protein 

The SCA1 gene encodes a novel protein, ataxin-1, a representative 
example of which is shown in Figure 15 with an estimated molecular weight of 
about 87 kD. It is to be understood that ataxin-1 represents a set of proteins 
produced from the SCA1 gene with its unstable CAG region. Ataxin-1 can be 
produced from cell cultures. With the aid of recombinant DNA techniques, 
synthetic DNA and cDNA coding for ataxin-1 can be introduced into 
microorganisms which can then be made to produce the peptide. It is also possible 
to manufacture ataxin-1 synthetically, in a manner such as is known for peptide 
syntheses. 



25 
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Ataxin-1 is preferably recovered from the culture medium as a 
cytosolic polypeptide, although it can also be recovered as a secreted polypeptide 
when expressed with a secretory signal. 

Ataxin-1 can be purified from recombinant cell proteins or 
5 polypeptides to obtain preparations that are substantially homogenous as ataxin-1. 
As a first step, the culture medium or lysate is centrifiiged to remove particulate cell 
debris. The membrane and soluble protein fractions are then separated. The ataxin- 
1 may then be purified from the soluble protein fraction and from the membrane 
fraction of the culture lysate, depending on whether the ataxin-1 is membrane 

10 bound. If necessary, ataxin-1 is further purified from contaminant soluble proteins 
and polypeptides, with the following procedures being exemplary of suitable 
purification procedures: by fractionation on immunGaffinity or ion-exchange 
columns; ethanol precipitation; reverse phase HPLC; chromatography on silica or on 
a cation-exchange resin such as DEAE; chromatofocusing; SDS-PAGE; ammonium 

15 sulfate precipitation; gel filtration using, for example, Sephadex G-75; ligand 
affinity chromatography, using, e.g., protein A Sepharose columns to remove 
contaminants such as IgG. 

Ataxin-1 variants in which residues have been deleted, inserted, or 
substituted are recovered in the same fashion as native ataxin-1, taking account of 

20 any substantial changes in properties occasioned by the variation. For example, 
preparation of a ataxin-1 fusion with another protein or polypeptide, e.g., a bacterial 
or viral antigen, facilitates purification; an immunoaffinity column containing 
antibody to the antigen can be used to adsorb the fusion polypeptide. 
Immunoaffinity columns such as a rabbit polyclonal ataxin-1 column can be 

25 employed to absorb the ataxin-1 variant by binding it to at least one remaining 
immune epitope. Alternatively, the ataxin-1 may be purified by affinity 
chromatography using a purified ataxin-1 -IgG coupled to a (preferably) immobilized 
resin such as Affi-Gel 10 (Bio-Rad, Richmond, CA) or the like, by means well- 
known in the ait. A protease inhibitor such as phenyl methyl sulfonyl fluoride 

30 (PMSF) also may be useful to inhibit proteolytic degradation during purification, 
and antibiotics may be included to prevent the growth of adventitious contaminants. 

Covalent modifications of ataxin-1 are included within the scope of 
this invention. Both native ataxin-1 and amino acid sequence variants of the ataxin- 
1 may be covalently modified. Covalent modifications included within the scope of 
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this invention are those producing one or more ataxin-1 fragments. Ataxin-1 
fragments having any number of amino acid residues may be conveniently prepared 
by chemical synthesis, by enzymatic or chemical cleavage of the full-length or 
variant ataxin-1 polypeptide, or by cloning and expressing only portions of the 
5 SCA1 gene. Other types of covalent modifications of ataxin-1 or fragments thereof 
are introduced into the molecule by reacting targeted amino acid residues of the 
ataxin-1 or fragments thereof with a derivatizing agent capable of reacting with 
selected side chains or the N- or C-terminal residues. 

For example, cysteinyl residues most commonly are reacted with ot- 

10 haloacetates (and corresponding amines), such as iodoacetic acid or iodoacetamide, 
to give carboxymethyl or carboxyamidomethyl derivatives. Cysteinyl residues also 
are derivatized by reaction with bromotrifluoroaeetGne, a-bromo-p-(5- 
imidozoyl)propionic acid, iodoacetyl phosphate, N-alkylmaleimides, 3-nitro-2- 
pyridyl disulfide, methyl 2-pyridyl disulfide, />-chloromercuribenzoate, 2- 

1 5 chloromercuri-4-nitrophenol, or chloro-7-nitrobenzo~2-oxa- 1 ,3-diazole. 

Histidyl residues are derivatized by reaction with 
diethylpyrocarbonate />-bromophenacyl. Lysinyl and amino terminal residues are 
derivatized with succinic or other carboxylic acid anhydrides and imidoesters such 
as methyl picolinimidate; pyridoxal phosphate; pyridoxal; chloroborohydride; 

20 trinitrobenzenesulfonic acid; O-methylisourea; 2,4-pentanedione; and transaminase- 
catalyzed reaction with glyoxylate. Arginyl residues are modified by reaction with 
phenylglyoxal, 2,3-butanedione, 1,2-cyclohexanedione, and ninhydrin, among 
others. 

Specific modification of tyrosyl residues may be made, with 
25 particular interest in introducing spectral labels into tyrosyl residues by reaction 
with aromatic diazonium compounds or tetranitromethane. Most commonly, N- 
acetylimidizole and tetranitromethane are used to form O-acetyl tyrosyl species and 
3-nitro derivatives, respectively. Tyrosyl residues are iodinated using 125 I or 131 I to 
prepared labeled proteins for use in radioimmunoassay, the chloramine T method 
30 described above being suitable. 

Carboxyl side groups (aspartyl or glutamyl) are selectively modified 
by reaction with carbodiimides (R-N=C=N-R'), where R and R' are different alkyl 
groups, such as l-cyclohexyl-3-(2-morpholinyl-4-ethyl)carbodiimide or l-ethyl-3^ 
(4-azonia-4,4-dimethylpentyl)carbodiimide. Furthermore, aspartyl and glutamyl 
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residues are converted to asparaginyl and glutaminyl residues by reaction with 
ammonium ions. 

Derivatization with bifunctional agents is useful for crosslinking 
ataxin-1 to a water-insoluble support matrix or surface for use in the method for 
5 purifying anti-ataxin-1 antibodies, and vice versa. Commonly used crosslinking 
agents include, e.g., l,l-bis(diazoacetyl)-2-phenylethane, glutaraldehyde, and N- 
hydroxysuccinimide esters, for example, esters with 4-azidosalicylic acid, 
homobifunctional imidoesters, including disuccinimidyl esters such as 3,3'- 
dithiobis(succinimidylpropionate), and bifunctional maleimides such as bis-N- 

10 maleimido-l,8-octane, Derivatizing agents such as methyl-3-[(p- 

azidophenyl)dithio]propiomidate yield photoactivatable intermediates that are 
capable of forming crosslinks in the presence of light. Alternatively, -reactive water- 
insoluble matrices such as cyanogen bromide-activated carbohydrates and the 
reactive substrates are employed for protein immobilization. 

15 Glutaminyl and asparaginyl residues are frequently deamidated to the 

corresponding glutamyl and aspartyl residues, respectively. These residues are 
deamidated under neutral or basic conditions. The deamidated form of these 
residues falls within the scope of this invention. 

Other modifications include hydroxylation of proline and lysine, 

20 phosphorylation of hydroxyl groups of seryl or threonyl residues, methylation of the 
a-amino groups of lysine, arginine, and histidine side chains, acetylation of the N- 
terminal amine, amidation of any C-terminal carboxyl group, and glycosylation of 
any suitable residue. 

25 IL AntPbQflfes 

The present invention also relates to polyclonal or monoclonal 
antibodies raised against ataxin-1 or ataxin-1 fragments (preferably fragments 
having 8-40 amino acids, more preferably 10-20 amino acids, that form the surface 
of the folded protein), or variants thereof, and to diagnostic methods based on the 

30 use of such antibodies, including but not limited to Western blotting and ELISA 
(enzyme-linked immunosorbant assay). 

Polyclonal antibodies to the SCA1 polypeptide generally are raised in 
animals by multiple subcutaneous (sc) or intraperitoneal (ip) injections of ataxin-1, 
ataxin-1 fragments, or variants thereof, and an adjuvant. The polypeptide can be a 
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in the protein sequence that is on the surface of the folded protein and is thus likely 
to be antigenic. It may be useful to conjugate the SCA1 polypeptide (including 
fragments containing a specific amino acid sequence) to a protein that is 
5 immunogenic in the species to be immunized, e.g., keyhole limpet hemocyanin, 
serum albumin, bovine thyroglobulin, or soybean trypsin inhibitor using a 
Afunctional or derivatizing agent, for example, maleimidobenzoyl sulfosuccinimide 
ester (conjugation through cysteine residues), N-hydroxysuccinimide (through 
lysine residues), glutaraldehye, succinic anhydride, SOCl 2 , or R 1 N=C=NR, where R 
10 and R 1 are different alkyl groups. Conjugates also can be made in recombinant cell 
culture as protein fusions. Also, aggregating agents such as alum are used to 
enhance the immune response. 

The route and schedule of immunizing a host animal or removing and 
culturing antibody-producing cells are variable and are generally in keeping with 
15 established and conventional techniques for antibody stimulation and production. 
While mice are frequently employed as the host animal, it is contemplated that any 
mammalian subject including human subjects or antibody-producing cells obtained 
therefrom can be manipulated according to the processes of this invention to serve 
as the basis for production of mammalian, including human, hybrid cell lines. 
20 Preferably, rabbits are used to raise antibodies against ataxin-1 . 

'Animals are typically immunized against the immunogenic 
conjugates or derivatives by combining about 10 jig to about 1 mg of ataxin-1 with 
about 2-3 volumes of Freund's complete adjuvant and injecting the solution 
intradermally at multiple sites. About one month later the animals are boosted with 
25 about 1/5 to about 1/10 the original amount of conjugate in Freund's complete 
adjuvant (or other suitable adjuvant) by subcutaneous injection at multiple sites. 
About 7 to 14 days later animals are bled and the serum is assayed for anti-ataxin-1 
polypeptide titer. 

Serum antibodies (IgG) are purified via protein purification protocols 
30 that are well known in the art. Antibody/antigen reactivity is analyzed using 
Western blotting, wherein suspected antigens are blotted to a nitrocellulose filter, 
exposed to potential antibodies and allowed to hybridize under defined conditions. 
See Gershoni et al., Anal. Biochem. . HI, 1-15 (1983). The protein antigens can 
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then be sequenced using standard sequencing methods directly from the 
antibody/antigen complexes on the nitrocellulose support. 

Monoclonal antibodies are prepared by recovering immune cells - 
typically spleen cells or lymphocytes from lymph node tissue - from immunized 
5 animals (usually mice) and immortalizing the cells in conventional fashion, e.g., by 
fusion with myeloma cells. The hybridoma technique described originally by 
Kohler et aL, Eur. J. Immunol. . 6, 511 (1976) has been widely applied to produce 
hybrid cell lines that secrete high levels of monoclonal antibodies against many 
specific antigens. It is possible to fuse cells of one species with another. However, 

10 it is preferable that the source of the immunized antibody-producing cells and the 
myeloma be from the same species. While mouse monoclonal antibodies are 
routinely used, the present invention is not so limited. In fact, although mouse 
monoclonal antibodies are typically used, human antibodies may be used and may 
prove to be preferable. Such antibodies can be obtained by using human 

15 hybridomas. Cote et aL; Monoclonal Antibodies and Cancer Therapy: A.R. Liss, 
Ed.; p. 77 (1985). 

The secreted antibody is recovered from tissue culture supernatant by 
conventional methods such as precipitation, ion exchange chromatography, affinity 
chromatography, or the like. The antibodies described herein are also recovered 

20 from hybridoma cell cultures by conventional methods for purification of IgG or 
IgM, as the case may be, that heretofore have been used to purify these 
immunoglobulins from pooled plasma, e.g., ethanol or polyethylene glycol 
precipitation procedures. The purified antibodies are sterile filtered, and optionally 
are conjugated to a detectable marker such as an enzyme or spin label for use in 

25 diagnostic assays of the ataxin-1 in test samples. 

Techniques for creating recombinant DNA versions of the antigen- 
binding regions of antibody molecules (known as Fab fragments), which bypass the 
generation of monoclonal antibodies, are encompassed within the practice of this 
invention. Antibody-specific messenger RNA molecules are extracted from 

30 immune system cells taken from an immunized animal, transcribed into 
complementary DNA (cDNA), and the cDNA is cloned into a bacterial expression 
system. 

The anti-ataxin-1 antibody preparations of the present invention are 
specific to ataxin-1 and do not react immunochemically with other substances in a 
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manner that would interfere with a given use. For example, they can be used to 
screen for .the presence of ataxin-1 in tissue extracts to determine tissue-specific 
expression levels of ataxin-1 . 

The present invention also encompasses an immunochemical assay 
5 that involves subjecting antibodies directed against ataxin-1 to reaction with the 
ataxin-1 present in a sample to thus form an (ataxin-1 /anti-ataxin-1) immune 
complex, the formation and amount of which are measures - qualitative and 
quantitative, respectively - of the ataxin-1 presence in the sample. The addition of 
other reagents capable of biospecifically reacting with constituents of the 

10 protein/antibody complex, such as anti-antibodies provided with analytically 
detectable groups, facilitates detection and quantification of ataxin-1 in biological 
samples, arid is especially useful for quantitating the level of ataxin-1 in biological 
samples. Ataxin-1 /anti-ataxin-1 complexes can also be subjected to amino acid 
sequencing using methods well known in the art to determine the length of a 

15 polyglutamine region and thereby provide information about likelihood of affliction 
with spinocerebellar ataxia and likely age of onset. Competitive inhibition and non- 
competitive methods, precipitation methods, heterogeneous and homogeneous 
methods, various methods named according to the analytically detectable group 
employed, Immunoelectrophoresis, particle agglutination, immunodiffusion and 
20 immunohistochemical methods employing labeled antibodies may all be used in 
connection with the immune assay described above. 

The invention has been described with reference to various specific 
and preferred embodiments and will be further described by reference to the 
25 following detailed examples. It is understood, however, that there are many 
extensions, variations, and modifications on the basic theme of the present invention 
beyond that shown in the examples and detailed description, which are within the 
spirit and scope of the present invention. 
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E^perimentaj Section 
L The Gene for SCA1 Maps Centromeric to D6S89 

To confirm the position of SCA1 with respect to D6S89 and to identify 
closer flanking markers, two dinucleotide repeat polymorphisms D6S109 and 
5 D6S202 were used. Using YAC clones isolated in the D6S89 region, three 
additional dinucleotide repeat polymorphisms were identified, one of which 
(AM10GA) showed no recombination with SCA1 and confirmed that D6S89 is 
telomeric to SCA1. The dinucleotide repeat at D6S109 revealed six recombination 
events with SCA1 and determined D6S109 to be the other flanking marker at the 
10 centromeric end. Linkage analysis, physical mapping data as discussed below, and 
analysis of recombination events demonstrated that the order of markers is as 
follows: Centomere = D6S109 - AMIOGA/SCAI - D6S89 - SB 1 - LR40 - D6S202 - 
Telomere. 

15 A* Materials and Methods 

1, SCA1 Kindreds 

Nine large SCA1 families were used in the present study. Clinical 
findings and linkage data demonstrating that these families segregated SCA1 have 
been previously reported. See, J.F. Jackson et al., N. Engl. J. Med.. 296 . 1 138-1 141 
20 (1977); BXB. Keats et ah, Am. J. Hum. Genet. . 42, 972-977 (1991); L.P.W. Ranum 
et al., Am, J. Hum. Genet, 42, 31-41 (1991); and H.Y. Zoghbi et al., Am. J. Hum. 
Genet. . 42, 23-30 (1991). Analysis of polymorphisms at the loci D6S109, 
AM10GA, SB1, LR40, and D6S202 was performed on individuals from these 
kindreds, 

25 The Houston (TX-SCA1) kindred included 106 individuals, of whom 

57 (25 affected) were genotyped. See, H.Y. Zoghbi et al., Ann. NeuroL. 22, 580- 
584 (1988). Patients symptomatic at the time of exam, as well as asymptomatic 
individuals who had both a symptomatic child and a symptomatic parent, were 
classified as "affected." In this kindred, a deceased individual previously assigned 

30 as affected (from family history data) was reassigned an unknown status after 
review of medical records. This reassignment eliminated what was previously 
thought to be a recombination event between SCA1 and D6S89 in the TX-SCA1 
kindred. To maximize the amount of information available for linkage analysis, the 
two chromosomes 6 in somatic cell hybrids for 15 affected individuals and one 
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unaffected individual from the TX-SCA1 kindred were separated. See, H.Y. Zoghbi 
et al., Am. J. Hum. Genet. . 44, 255-263 (1989). The Louisiana (LA-SCA1) kindred 
included 50 individuals of whom 26 (8 affected) were genotyped. See, B.J.B. Keats 
et al., Am. J. Hum. Genet. . 42, 972-977 (1991). The Minnesota (MN-SCA1) 
5 kindred included 175 individuals, of whom 106 (17 affected) were genotyped. See, 
J.L. Haines et al., Neurolog y. 24, 1542-1548 (1984); and L.P.W. Ranum et al., Am. 
J. Hum. Genet. . 42, 31-41 (1991). The Michigan (MI-SCA1) kindred included 201 
individuals, of whom 127 (25 affected) were genotyped. See, H.E. Nino et al., 
Neurology . 12-20 (1980). The Mississippi (MS-SCA1) kindred included 84 

10 individuals, of whom 37 (17 affected) were genotyped. See, J.F. Jackson et al., £L 
En gl, J. Med, , 226, 1138-1141 (1977). 

Four Italian families segregating SCAi were analyzed; their clinical 
phenotype and HLA linkage data were reported previously. See, M. Spadaro et al., 
Acta Neurol. Scand. . 257-265 (1992). Three families originated in the Calabria 

15 Region (Southern Italy): family IT-P with 135 members of whom 80 (21 affected) 
were genotyped; for computational reasons, the family was subdivided into 3 
different pedigrees (RM, VI, and FB) and only one of the 3 consanguinity loops was 
considered; family IT-NS, with 43 members of whom 27 (7 affected) were typed; 
family IT-NS with 51 members of whom 16 (3 affected) were typed. The fourth 

20 family, IT-MR, originated from Latiurn and consisted of 17 individuals of whom 10 
(4 affected) were genotyped. 

2. CEPH Familie s 

The 40 CEPH reference families were genotyped at the D9S109, LR40 
25 and D6S202 loci in order to provide a large number of informative meioses for 
marker-marker linkage analyses. Markers AM10GA and SB1 flank D6S89, having 
been isolated from a yeast artificial chromosome (YAC) contig built bidirectionally 
from D6S89 (see below). A subset of 18 CEPH families which defined 26 
recombinants between D6S109 and D6S89 was genotyped at AM10GA and SB1 in 
30 order to determine the order of AM 1 0GA, D6S89 and SB 1 with respect to D6S 1 09. 

3. Cloning of Sequences Containing Pinvcleotide Repeats 

The identification and description of polymorphic dinucleotide repeats 
at the D6S109 and D6S202 loci have been previously reported. See, L.P.W. Ranum 
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et al., Nucleic Acids Res.. 12, 1171 (1991); and F. LeBorgne-Demarquoy et al., 
Nucleic Acids Res. . 12, 6060 (1991). 

DNA fragments containing dinucleotide repeats were cloned at LR40 
and SB1 from yeast artificial chromosome (YAC) clones at the LR40 and FLB1 
5 loci, respectively (see below). DNA from each YAC clone was amplified in a 50 nl 
reaction containing 20 ng DNA, a single Alu primer (see below), 50 mM KC1, 10 
mM Tris-Cl pH 8.3, 1.25 mM MgCI 2 , 200 or 250 ^iM dNTPs, 0.01% (w/v) gelatin, 
and 1.25 units Thermus aquaticus DNA polymerase (Taq polymerase-Perkin 
Elmer, Norwalk, CT). For amplification of FLB1 YAC DNA, a primer 
10 complementary to the 5' end of the Alu consensus sequence (Oncor Laboratories, 
Gaithersberg, MD), designated SAL1, was used = 5 s - 
AGGAGTGAGCC ACCGC ACCC AGCC-3 ' at a final concentration of 0.6 ixM, For 
amplification of LR40 YAC DNA, 0.2 primer PDJ34 was used. See, C. 
Breukel et al., Nucleic Acids Res.. IS, 3097 (1990). Samples were overlaid with 
15 mineral oil, denatured at 94°C for 5 minutes, then subjected to 30 cycles of 1 minute 
94°C denaturation, 1 minute 55°C annealing, and 5 minutes 72°C extension. The 
last extension step was lengthened to 10 minutes. Electrophoresis of 15 ^1 of PCR 
products was performed on a 1.5% agarose gel, which was Southern blotted and 
hybridized with a probe prepared by random-hexamer-primed labelling of synthetic 
20 poly(dG-dT)-poly(dA-dC) (Pharmacia, Piscataway, NJ) using [a- 32 P]dCTP, as 
described by A.P. Feinberg et al., Anal. Biochem.. HZ. 266-267 (1984). Fragments 
hybridizing with the dinucleotide repeat probe were identified and were 
subsequently purified by electrophoresis on a low-melt agarose gel. Fragments were 
excised and reamplified by PCR as above. 
25 For LR40, reamplified DNA was repurified by low-melt gel 

electrophoresis, and DNA extracted from excised bands by passage through a 
glasswool spin column as described by D.M. Heery et al., Trends Genet , 1 73 
(1990). A purified 1.2-kb fragment was cloned into pBluescript plasmid modified 
as a "T-vector" as described by D. Marchuck et al., Nucleic Acids Res. . 12, 1 154 
30 (1 990). From this clone, a 0.6-kb Hindi restriction fragment containing a GT repeat 
was subcloned into pBluescript plasmid, and sequenced on an Applied Biosystems, 
Inc. (Foster City, CA) automated sequencer. 
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For SB1, a reamplified 1-kb fragment was ethanol precipitated and 
blunt-end cloned into pBluescript plasmid. Plasmid DNA was isolated and PCR 
amplified in one reaction with Ml 3 Reverse primer plus BamGT primer (5'- 
CCCGGATCCTGTGTGTGTGTGTGTGTG-3') and in a second reaction Ml 3 
5 Universal primer and BamCA primer (5*- 

CCCGGATCCACACACACACACACACAC-3'). See, C.A. Feener et al., Am. J. 
Hum. Genet. . 4£, 621-627 (1991). PCR conditions were as above except primers 
were used at 1 nM concentration; 2.5 units Taq polymerase and approximately 30 
ng DNA were used per reaction, with final reaction volumes of 100 and an 

10 annealing temperature of 50°C. Products were precipitated, resuspended, and 
digested with BaniM (product of Universal primer reaction) or BamHl and Hindi 
(product of Reverse primer reaction). These two fragments were cloned into 
pBluescript plasmid and sequenced as above. 

Dinucleotide repeats were cloned at AM10 from a YAC containing 

15 this locus. A AJFixII library was constructed using DNA from this yeast clone, and 
human clones were identified by filter hybridization using human placental DNA as 
a probe. A gridded array of these human clones was grown, and filters containing 
DNA from these clones were hybridized with a 32 P-labelIed poly(dG-dT)-poly(dA- 
dC3) probe as described above. DNA was prepared from positive clones, digested 

20 with various restriction enzymes, and analyzed by agarose gel electrophoresis. 
Southern blotting and hybridization were carried out with the poly(dG-dT)-poly(dA- 
dC) probe. A 1-kb fragment hybridizing with the dinucleotide repeat probe was 
identified, clones into Ml 3, and sequenced. 

25 4. PCR Analysis 

Primer sequences and concentrations, and PCR cycle times used for 
amplification of dinucleotide repeat sequences from human genomic DNA are 
presented in Table 1 . For the LR40 polymorphism, primer set "A" was used for 
analysis of the TC-SCA1, LA-SCA1, and MS-SCA1 kindreds, while primer set "B" 

30 was used for all other kindreds. Buffer compositions were as follows: 50 mM KC1, 
10 mM Tris-Cl pH 8.3, 1.25 mM MgCl 2 (1.5 mM MgCl 2 for AM10GA), 250 pM 
dNTPs (200 hM dNTPs for AM10GA), 0.01% (w/v) gelatin, and 0.5 - 0.625 unit 
Taq polymerase. For the LR40 analysis, 2% formamide was included in the PCR 
buffer. When primer set B was used for LR40 analysis, 125 ^M dNTPs, 1.5 mM 
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MgCl 2 » and 1 unit Taq polymerase were used. All reaction volumes were 25 and 
contained 40 ng genomic DNA. Four microliters of each reaction was mixed with 2 
^il formamide loading buffer, denatured at 90-100°C for 3 minutes, cooled on ice. 
and 2-4 jil was used for electrophoresis on a 4% or 6% polyacrylamide/7.65 M urea 
5 sequencing gel for 2-3 hours at 1 100 V. PCR assay conditions have been reported 
previously for D6S202 and D6S109. See, L.P.W. Ranum et al., Nucleic Acids Res. . 
12, 1171 (1991); and F. LeBorgne-Demarquoy et al., Nucleic Acids Res. . 12, 6060 
(1991). 
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Table 1. 

Primers and PCR conditions for amplification of 
dinucleotide repeat sequences 

PCR 

_a 



Marker/Tvpe Primers Steps Cycles 

AM 1 0GA/(GA) n AAGTCAGCCTCTACTCTTTGT 94°C for 30 sec. 
TGA 

CTTGGAGCAGTCTGTAGGGAG 55°C for 30 sec. - 30 

72°C for 30 sec. 



SBl/(GT) n 



TGAAGTGATGTGCTCTGTTC 
AAAGGGGTAGAGGAAATGAG 



94°C for 60 sec. 
60°C for 60 sec. 
72°C for 60 sec. 



30 



LR40/(GT) n AGGAGAGGGGTCATGAGTTG 94°C for 60 sec. 

set A GGCTCATGAATACATTACATG 

AAG 58°C for 60 sec. 

72°C for 60 sec. 



25 



LR40/(GT)„ CTCATTCACCTTAGAGACAAA 
TGGATAG 

set B ATGGTATAGGGATTTTNCCAA 
ACCTG 



94°C for 60 sec. 

60°C for 60 sec. 
72°C for 45 sec. 



27 



a Primers are shown as 5' to 3' sequence. The first primer of each pair was 
end-labelled with y- 32 P ATP and polynucleotide kinase. Primer concentrations 
were 1 mM. 



5 
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5. SCA1 L inkage Analysis 

The D6S109, AM10GA, D6S89, SB1, LR40 and D6S202 markers 
were analyzed for linkage to SCA1 using the computer program LINKAGE version 
5.1 which includes the MLINK, ILINK, LINKMAP, CLODSCORE and CMAP 
5 programs. See, G.M. Lathrop et al., Proc. Natl. Acad. Sci. USA . £1, 3443-3446 
(1984). Age dependent penetrance classes were assigned independently for each of 
the families included in the analysis. Marker alleles were recoded to reduce the 
number of alleles segregating in a family to four, five or six alleles to simplify the 
analysis. The allele frequencies for the various markers were based on the 

10 frequencies of the alleles among the spouses in each family and were determined 
separately for the two American black kindreds, for the Italian kindreds, and for the 
Caucasian kindreds from Minnesota, Michigan, and Mississippi, with the following 
exception - the allele frequencies for D6S109 in the MI and MN kindreds were 
based on the frequencies of the alleles in the CEPH families. 

15 Maximum LOD scores for the various markers were calculated with 

the MLINK program by running each of the analyses separately for the various 
families, at theta values with increments of 0.0005 to 0.001, and then adding the 
values of each of the kindreds. The analyses were done separately to ensure that the 
allele frequencies for the various markers were representative for each of the 

20 ethnically diverse families. As a control, the recombination fractions at the 
maximum lod scores (Z^^ between each marker and SCA1 were calculated using 
the ILINK program after the allele frequencies for each marker were set equal to one 
another. In all cases the recombination frequencies were the same and Z max values 
were very similar to those reported in Table 5 below. 

25 

6. CEPH Lintoge Analysis 

Forty CEPH families were typed for the GT repeat markers D6S109, 
D6S202 and LR40. The original alleles were recoded to five alleles. The SB1 and 
AM 10 markers were typed in a subset of the CEPH panel which defined 26 
30 recombinants from 18 different families between D6S109 and D6S89. The 
CLODSCORE program was used for the two-point analyses and CMAP was used 
for the three-and four-point analyses. For the three-point and four-point analyses, 
the interval between the mapped markers was fixed based on the two point 9 m = 9 f 
results. The likelihood of the location of the test locus (SCA1) was calculated at 10 
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different positions within each interval. The test for sex difference in the © values 
was performed using a x statistic, with % 2 = 2(lnlO)[Z(e m ,9 f ) - Z(9 = 9 m = 9 f )], 
where Z(9 m , 9 f ) is the overall for arbitrary 0 m and 9 f , while Z(9 = 6 m = 9 f ) is the 
Zmax constrained to 9 m = 8 f . Under homogeneity (HI), approximates a % 2 with 1 
5 d.f. Rejection of homogeneity occurs when %> 3.84. 

B. Results 

1 . Dinucleotide Repeat Cloning and Sequencing and Analysis 

Dinucleotide repeats SB1 and LR40 were amplified directly from 

10 YAC clones by ^4/w-primed PCR and the dinucleotide repeat containing fragments 
were identified by hybridization. The PCR products were cloned either directly or 
by furthe r am plific ation usi ng taile d poly(GT) or poly(CA) primer s paired with an 
Alu primer. In addition, two dinucleotide repeats were subcloned from a lambda 
phage clone from a library constructed from a YAC at the AM 10 locus. 

15 Dinucleotide repeats from the SB1, LR40, and AM 10 loci were 

sequenced. At LR40, the cloned repeat sequence was (CA) !6 TA(CA), 0 . The AMI 0 
fragment contained two repeat sequences separated by 45 bp of nonrepeat sequence. 
The. first repeat, designated AM10GA, was (GA) 2 ATGACA(GA)„. The second 
repeat, designated AMI OGT, was not used in this study because upon analysis of the 

20 TX-SCA1 kindred it yielded the same information as the AM10GA repeat. The 
AMI OGT repeat consists of (GA) 2 AA(GA) 6 GTGA(GT) 16 AT(GT) 5 . Primer 
information for AMI OGT is available through the Genome Data Base. At SB1, the 
repeat tract was not sequenced; only flanking sequence was determined. 

As there are differences in allele distributions of markers among the 

25 different races, allele frequencies are reported here separately for the CEPH kindreds 
(Caucasian) and the TX-SCA1 kindred (American black) (Table 2). CEPH allele 
frequencies were based on 72 independent chromosomes for SBL 82 independent 
chromosomes for AM 10, and on the full set of 40 families for D6S109 and LR40. 
TX-SCA1 allele frequencies were based on 45 independent chromosomes for LR40, 

30 43 independent chromosomes for SB1, 45 independent chromosomes for AM 10, 
and 42 independent chromosomes for D6S109. 
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2. Genetic Linkage Data 

a. CEPH families. In order to establish a well-defined genetic map for 
the SCA1 region, newly isolated DNA markers were mapped using the CEPH 
reference families. Results of pairwise linkage analyses in CEPH kindreds are 
5 shown in Table 3. No recombination was observed between AM10GA and D6S89 
(9 = 0.00, Zmax = 15.1) using a subset of the CEPH panel which defined 26 
recombinants between D6S109 and D6S89. The markers D6S109 and LR40 are 
close to D6S89, with recombination fractions of 0.067 (Z max = 71.4) and 0.04 (Z^ 
= 84.5) respectively. 

10 Selected multipoint analyses were performed to position the newly 

isolated markers D6S109, LR40, D6S202 with respect to markers previously 
mapped using the CEPH panel. The CMAP program was used for three- and four- 
point linkage analyses to position D6S109 relative to D6S88 and D6S89 and to 
position LR40 and D6S202 relative to each other and to D6S89 and F13A. For the 

15 three-point analyses, the D6S88 - D6S89 interval was fixed based on the two-point 
recombination fraction in CEPH and the lod score was calculated at various 
recombination fractions. The order D6S88 - D6S109 - D6S89 is favored over the 
next most likely order by odds of 4 x 10 3 : 1 (Table 4). For the four-point analyses, 
both the D6S89 - D6S202 - F13A and the D6S89 - LR40 - F13A intervals were 

20 fixed based on the two-point recombination fractions; lod scores were then 
calculated for LR40 and D6S202 at various 0 values on the respective fixed maps. 
The order D6S89 - LR40 - D6S202 - F13A is favored over the next most likely 
order in both analyses; odds in favor were 400 : 1 when the position of LR40 was 
varied and were 1 x 10 6 to 1 whenD6S202 was varied (Table 4). 

25 The order of AM10GA and D6S89 could not be determined using the 

D6S109/D6S89 CEPH recombinants. However, the order AM10GA - D6S89 - SB1 
was deduced by characterization of overlapping yeast artificial chromosome clones 
containing these markers (see below). Furthermore, one end of this contig is present 
in a well characterized radiation-reduced hybrid known to contain D6S109 and other 

30 centromeric markers, indicating the order D6S 1 09 - AM 1 0GA - D6S89 - SB 1 . 
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Table 3. 

Pairwise linkage results in CEPH 



Marker Pair 


e m =Gr 


7 

'-'max 


e m 


e r 


7 

'"'max 


x 2 


HLA and D6S88 


0.128 


26.4 


0.103 


0.168 


26.8 


1.86 


D6S109 


0.126 


48.4 


0.062 


0.176 


51.0 


12.1* 


AM 10 


0.608 


0.0440 


0.301 


0.500 


0.246 


0.929 


D6S89 


0.158 


43.3 


0.091 


0.225 


46.6 


15.2* 


SB1 


0.574 


0.0190 


0.299 


0.500 


0.400 


0 381 


LR40 


0.213 


25.5 


0.116 


0.306 


30.0 


20 8* 


HZ30 


0.251 


21.6 


0.191 


0.318 


23.6 


8,95* 


F13A 


0.291 


8.81 


0.255 


0.326 


9.14 


1.52 


D6S88 andD6S109 


0.017 


48.6 


0.024 


0.009 


48.8 


0.846 


AM10 


0.654 


0.0290 


0.499 


0.696 


0 047 


0 0820 


D6S89 


0.086 


36.1 


0.076 


0.098 


36.2 


0 0750 


SB1 


0.203 


1.09 " 


0.136 


0.687 


1.36 


1 27 


LR40 


0.088 


31.1 


0.078 


0.104 


31.2 


0.350 


HZ30 


0.135 


30.4 


0.124 


0.152 


30.4 


0.340 


F13A 


0.180 


10.2 


0.158 


0.217 


10.3 


0.626 


D6S109and AM10 


0.730 


0.933 


0.170 


0.502 


1.67 


3.39 


D6S89 


0.067 


71.4 


0.035 


0.090 


72.5 


5.15* 


SB1 


0.742 


1.95 


0.113 


0.501 


4 32 




LR40 


0.109 


50.6 


0.050 


0.152 


52.9 


10.5* 


HZ30 


0.162 


36.6 


0.147 


0.174 


36.7 


0.515 


F13A 


0.207 


14.4 


0.211 


0.204 


14.4 


0.0368 


AM10 and D6S89 


0.000 


15.1 


0.000 


0.000 


15.1 


0.000 


SB1 


0.000 


13.2 


0.000 


0.000 


13.2 


0.000 


LR40 


0.021 


8.74 


0.000 


0.050 


9.11 


1.74 


HZ30 


0.000 


13.8 


0.000 


0.000 


13.8 


0.000 


F13A 


0.135 


3.48 


0.042 


0.253 


4.39 


4.16* 
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D6S89andSBl 


0.000 


25.0 


0.000 


0.000 


25.0 


0.000 


LR40 


0.040 


84.5 


0.030 


0.049 


84 7 


0 995 


HZ30 


0.078 


76.0 


0.075 


0.077 


76.0 


0.0230 


F13A 


0.151 


30.7 


0.139 


0.160 


30.7 


0.248 


OJD 1 dull i/lS-tvl 


ft (YX"X 




ft ftOO 


t\ t\AA 


1 A C 

14.5 


0.350 


HZ30 


0.026 


17.5 


0.032 


0.020 


17.5 


0.0300 


F13A 


0.136 


4,80 


0.119 


0.155 


4.84 


0.170 


LR40andHZ30 


ft ftTLO 


f>A_Q 


\J.\J S 


A ncA 

V.UJU 


UJ.U 


t r\rs 

i.vy 


F13A 


0.131 


29 A 


0.121 


0.140 


29.2 


0.189 


HZ30andF13A 


0.109 


38.4 


0.122 


0.106 


38.4 


0.0092 



♦Indicates statistically significant differences were observed in the recombination 
fractions when the assumption of homogeneity (0 m =e f ) was rejected; that is the 
likelihood that % > 3.84 with 1 degree of freedom should occur by chance in P < 
0.05. 
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b. SCA1 kindreds. Results of pairwise linkage analyses in SCA1 
kindreds are shown in Table 5. AM10GA, D6S89, and SB1 are all closely linked to 
SCA1. No recombination was observed between AM10GA and SCA1; the lod 
score is 42.1 at a recombination fraction of 0.00. The recombination fraction 
5 between D6S89 and SCA1 is 0.004 (lod score of 67.6). The recombination fraction 
between SB1 and SCA1 is 0.007 (lod score of 39.5). D6S109, LR40 and D6S202 
are linked to SCA1 as well, but at greater distances (recombination fractions of 0.04, 
0.03, and 0.08 respectively). Based on genetic mapping in nine large kindreds, the 
SCA1 locus is very close to D6S89 and AM10GA, with a Z^-l support interval 
10 less than or equal to 0.02 in both cases. 
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3. Analysis of Kev Recombinants 

One recombination event between D6S89 and SCA1 has been 
confirmed in an affected individual. The patient, individual MI-2 in Figure 4, was 
also recombinant at SB1, although uninformative at LR40 and D6S202. He carried 
5 a disease haplotype at the HLA, D6S109 and AM 10 loci, demonstrating that SCA1 
is centromeric to D6S89, as indicated by the rightmost arrow in Figure 4. To 
eliminate the possibility of sample mix-up, the patient's DNA was reextracted from 
a hair sample and retyped for D6S109, D6S89, D6S202, LR40, AM10GA, and SB 1. 
The results from the hair sample matched those from the cell line originally 

10 established from the patient's blood. The patient's medical records were carefully 
reexamined and it was confirmed that he did indeed -have ataxia. In addition, his 
haplotypes were consistent with those of a sister and a daughter. 

D6S109 lies centromeric to D6S89; six recombination events have 
been observed between D6S109 and SCA1, as shown in Figure 4. At this point, 

15 D6S109 is the centromeric marker closest to SCA1. The arrows in Figure 4 denote 
the maximum region common to all affected chromosomes, and therefore the 
maximum possible region containing the SCA1 gene, which extends from D6S89 to 
D6S109. 

No additional marker-SCAl recombination events have been observed 
20 between D6S89 and SB1. Markers further telomeric to SB1 show additional 
recombination with SCA1 — one recombination event between SCA1 and LR40 and 
three recombination events between SCA1 and D6S202. These events are depicted 
in Figure 4 (all recombination events depicted in Figure 4 are in affected 
individuals). 

25 

II. Ma pp in g and Cloning the Critical Region for the SCA1 Gene 

A 2.5-Mb yeast artificial chromosome (YAC) contig was developed 
with the ultimate goal of defining and cloning the region likely to contain the SCA1 
gene (SCA1 critical region). 



30 
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A. Materials a nd Methods 

1- Cell lines 

1-7 is a human-hamster hybrid cell line which contains the short arm of 
chromosome 6 as its only human chromosome. See, H.Y. Zoghbi et al., Genomics , 
5 & 352-357 (1990). R86, R78, R72, R54 and R17 are radiation reduced hybrid cell 
lines retaining various portions of 6p22-p23. See, H.Y. Zoghbi et al., Genomics . % 
713-720 (1991). R54 retains markers known to be telomeric to D6589, such as 
D6S202 andF13A. 

10 2. Generation of new D NA markers and Sequence-Tagged Sites TSTSs^ 

DNA from a radiation reduced hybrid retaining D6S89 (R86) and 
DNAs from four radiation hybrids (R78, R72, R54 and R17) which do not retain 
D6S89 but retain markers immediately flanking D6S89 were used in comparative 
Alu-PCR to isolate region-specific DNA markers. See, D.L. Nelson et al., Proc. 

15 Natl. Acad. Sci. USA . £6, 6686-6690 (1989); and H.Y. Zoghbi et al., Genomics . % 
713-720 (1991). In addition, R78 was useful in eliminating markers derived from 
the centromeric region of 6p. H.Y. Zoghbi et al., Genomics . 2, 713-720 (1991). Alu- 
PCR was carried out using Alu primers 559 and 517 individually (D.L. Nelson et al., 
Proc. Natl. Acad. Sci. USA. f&, 6686-6690 (1989)) as well as PDJ 34 (C. Breukel et 

20 al., Nucleic Acids Res. . 18, 3097 (1990)). Alu-PCR fragments found to be present 
in R86 but absent in R78, R72, R54 and R17 were identified and were cloned into 
£coRV-digested pBluescript IIKS+ plasmid (Stratagene, La Jolla, CA) which was 
modified using the T-vector protocol. See, D. Marchuk et al., Nucleic Acids Res. . 
12, 1 154 (1990). Cloned fragments were sequenced on an Applied Biosystems, Inc. 

25 (Foster City, CA) automated sequencer to establish STSs. 

3. Isolation and Characterization of YAC clones 

The Washington University YAC library (B.H. Brownstein et al., 
Science . 244 . 1348-1351 (1989)), and the CEPH YAC library (H.M. Albertsen, et 
30 al., Proc. Natl. Acad. Sci. USA . 32, 4256-4260 (1990)), were screened using a PCR- 
based method. See, E.D. Green et al., Proc. Natl. Acad. Sci. USA . £Z, 1213-1217 
(1990); and TJ. Kwiatkowski et al.. Nucleic Acids Res. . 18, 7191-7192 (1990). 
PCR amplifications were carried out in 25-50 ml final volume with 50 mM KC1, 10 
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mM Tris-HCl pH 8.3, 1 .25 mM MgCl 2 , 0.01% (w/v) gelatin, 250 of each dNTP; 
1.25 units of Amplitaq polymerase (Perkin-Elmer, Norwalk, CT) and 1 jiM of each 
primer. PCR cycle conditions are specified in Table 6. 
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Table 6> 
STSs and YACs in 6p22-p23 



Probe 


Primer set 


YACs* 


Annealing 
temp. b 


D6S89 


cttgttcatctgccttgtgcaccta 
agcgactgcctaaac 


B126G2,B134D5, 
B172B3, B214D3, 
C5C12, 191D8, 299B3, 
379C2, 468D12, 124G2, 
511H11 


55°C 


AM10 
(D6S335) 


ttaaggaagtgttcacatcaggg 
aattgtgcttatgtcactggg 


A23C3,A183C6, 
A250D5, B238F12, 
A91D2 


55°C 


A250D5-L 
(D6S337) 


aattctggagagaggatgttggt 
tctttttttggtag 


195B5,242C5,475A6. 
30F12 


44°C 


64U 


catcgtgttgtgtggtgaagctc 
agacgctaaactcaagg 


492H3, 172B5,227B1, 
261 H7 


50°C 


D6S288 


atgatccgtggtagtggcagga 
cctgttactgacgcc 


60H7,351B10 


55°C 


D6S274 


ctcatctgttgaatggggatctta 
aatgctatgccttccg 


486F9, 149H3, 42A5, 
283B2, 320E12 


55°C 


FLB1 
(D6S339) 


tgcaaatccctcagttcacttgctt 
gactttgccatgttc 


140H2, 270D3, 274D12, 
401D6, 57G3, 168F1 


50°C 


AM12 
(D6S336) 


atacccatacggatttgagggca 
■ acactatcaggctaagaatg 


A71B3.228A1, 193B3, 
90A12, 539C11,53G12, 
35E8 


55°C 


53G12-L 


caaataccagcaactcaccagc 
ggttccttcagcatcctacattc 


3G6, 82G12, 98G5, 
135F6, 198C8,330G1 


58°C 



a YACs in this study are from the CEPH and Washington University libraries. I.D. 
5 numbers identify the library source (Washington University LD. numbers are 
preceded by a letter). Several YACs were identified with more than one STS; for * 
such information, please refer to Table 2. 



0 PCR conditions were 94°C for 4 minutes followed by 35-40 cycles of 94°C 
10 denaturation for 1 minute, annealing at the specified temperature for 1 minute, and 
72°C extension for 2 minutes. A final extension step of 7 minutes at 72°C was used. 
PCR buffer and primer concentrations are as described in the text; for the 53G12-L 
STS a final concentration of 2% formamide was used in the PCR reaction. 
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Yeast DNA-agarose blocks were prepared as described by D.C. 
Schwartz et al., £ell, 22, 67-75 (1984); and G.J.B. van Ommen et al. in Human 
Genetic Diseases-A Practical Approach : K.E. Davies, ed.; pp. 113-117; IRL Press, 
Oxford (1986). All the YAC clones were analyzed by pulsed-field gel 
5 electrophoresis (PFGE) to determine the insert size and to confirm that a single 
YAC was present in a specific colony. YAC inserts were sized by electrophoresing 
yeast DNA through a 1% Fastlane agarose (FMC, Rockland, ME) gel in 0.5x TAE 
(20 mM Tris-acetate/0.5 mM EDTA). For rapid detection of possible overlaps 
between YAC clones isolated at different STSs, the labelled Alu-PCR products of 

10 new YACs were hybridized to filters containing Alu-PCR products of individual 
YACs in the region. Most of the YAC clones were tested for chimerism using the 
Alu-PCR dot blot method described by S. Banfi et al., Nucleic Acids Res. . 2Q, 1814 
(1992). The Alu-PCR products from YAC clones were hybridized to a dot-blot 
containing the Alu-PCR products from monochromosomal or highly reduced 

15 hybrids representing each of the 24 different human chromosomes as previously 
described by S. Banfi et al., Nucleic Acids Res. - 20, 1814 (1992). In addition a dot- 
blot containing Alu-PCR products from radiation reduced hybrids representing 
different segments of 6p was used to insure that a YAC does not contain two non- 
contiguous segments from 6p. Ends of YAC clones were isolated either by inverse- 

20 PCR as previously described by G. Joslyn et aL, CelL 601-613 (1991) or by Alu- 
vector PCR as described by D.L. Nelson et al., Proc. Natl. Acad. Sci. USA . ££, 
6157-6161 (1991). Alu- vector PCR was carried out using ^/w-primers PDJ34 and 
SAL1, as described by C. Breukel et al., Nucleic Acids Res. . IS, 3097 (1990); and 
the pYAC4 vector primers described by M.C. Wapenaar et al., Hum. Mol. Genet. . 2, 

25 947-995 (1993) and analogous vectors described by G.P. Bates et al., Nature 
Genetics . 1, 180-187 (1992). All YAC ends were regionally mapped by 
hybridization to Southern blots containing ZfcoRI-digested DNAs from the YAC 
clones and from the hybrid cell lines: 1-7, R86, and R72. 

30 4. Cosmid library preparation from YACs 

Cosmid libraries were prepared from four YAC clones; 227B1, 195B5, 
A250D5, and 379C2. Genomic DNA from YACs was partially digested with Mbo\ 
and cloned into cosmid vector superCos 1 (Stratagene, La Jolla, CA) following the 
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manufacturer's recommendations. Clones containing human inserts were identified 
using radiolabeled sheared human DNA as a probe. 

5 Long ran ge restriction analysis 
5 YAC plugs were digested to completion using rare-cutter restriction 

enzymes as described by M.C. Wapenaar et al., Hum. Mo l. Genet. . 2, 947-995 
(1993) and analogously by G.A. Silverman et al., Proc. Natl. Apfld, gcj. USA, 8£> 
7485-7489 (1989). Enzymes were purchased from New England Biolabs (Beverly, 
MA) and Boehringer Manheim Biochemicals (Indianapolis, IN) and were used as 

10 recommended by the manufacturer. All PFGE analyses were performed on a Bio- 
Rad CHEF apparatus under conditions that separate DNA fragments in the 50 kb to 
600 kb range. The gels were stained with ethidium bromide, and either acid nicked 
or subjected to 200,000 mJ of UV energy in a UV Stratalinker 1800 (Stratagene, La 
Jolla, CA). The gels were denatured in 0.4 N NaOH and transferred to Sure Blot 

15 hybridization membrane (Oncor, Gaithersburg, MD) in either lOxSSC (1.5 M 
NaCl/150 mM NaCitrate) or 0.4 N NaOH according to the manufacturer's 
recommendations. Hybridizations of the filters were carried out using the probes 
listed in Table 6 and Figure 6. Also pBR322 BamUVPrull fragments of 2.5 kb and 
1.6 kb specific for the left (TRP/CEN) and right (URA) pYAC4 vector arms 

20 respectively, were used. Probes were radiolabeled using the random priming 
technique described by A.P. Feinberg et al., Anal. Biochem. . 137 . 266-267 (1984); 
repetitive sequences were blocked using sheared human placental DNA as 
previously described by P.G. Sealy et al., Nucleic Acids Res.. H, 1 905-1 922 (1 985). 

25 6. Dinucleo tide repeat analysis 

Primer sequences and PCR cycle conditions are presented in Table 6. 
Buffer conditions were the same as for ^4/w-PCR. All reaction volumes were 25 \i\ 
and contained 40 ng of genomic DNA. One primer of each pair was labelled at the 
5' end with [y- 32 P] dATP. Four microliters of each reaction was mixed with 2 \il 

30 formamide loading buffer, denatured at 90-1 00°C for 3 minutes, cooled on ice and 
4-6 \x\ was used for electrophoresis on a 4% polyacrylamide/7.65 M urea sequencing 
gel. 



WO 95/01437 



PCT/US94/07336 



-55- 

IL Results 

1 Generation of sequence tagged sites in 6p22-p23 and YAC screening 

Comparative analysis of the ^4/w-PCR products from the radiation 
hybrid, which retains D6S89 (R86) and from the four radiation hybrids deleted for 
5 D6S89 but retaining markers which flank D6S89 (R78, R72, R54 and R17) allowed 
the identification of three new DNA fragments that were present in R86 but absent 
in the other four. These three DNA fragments termed, AM 10, AM 12 and FLB1 
were isolated and mapped using a 6p somatic cell hybrid panel and the radiation 
reduced hybrid panel (H.Y. Zoghbi et al., Genomics . 2, 713-720 (1991)) to confirm 

10 their regional localization. All three mapped to 6p and to R86 confirming their 
close proximity to the D6S89 locus. These three A!u-PCR fragments were 
subcloned and sequenced to establish sequenced tagged sites (STSs). STSs at 
AM10, AM12, FLB1 and D6S89 were used to screen the Washington University 
and the CEPH YAC libraries (H.M. Albertsen, et al., Proc. Natl. Acad. Sci. TTSA 

15 S2, 4256-4260 (1990); and B.H. Brownstein et al., Science . 2M, 1348-135 1 (1989)). 
YACs isolated at these four STSs were analyzed for overlap. Insert termini from the 
YACs representing contig ends .were isolated, subcloned and were sequenced to 
establish new STSs for further YAC walking. In one case an STS was established 
by using a subclone from a cosmid derived from a cosmid library generated for 

20 YAC 195B5. 

Recently several highly informative dinucleotide repeat markers have 
been identified and mapped genetically by J. Weissenbach et al., Nature T 359 794- 
801 (1992). As discussed above, two markers, D6S274 and D6S288 were found to 
map within the SCA1 critical region and were subsequently used to screen the YAC 
25 * libraries. Using the STSs listed in Table 6, YAC clones were isolated. 

2. Chara gterizatiionofYACclongs 

The sizes of the YAC inserts were determined by pulsed-field gel 
electrophoresis (PFGE); insert sizes ranged from 75-850 kb. Given the high 
30 frequency of insert chimerism, an Alu-PCR based hybridization strategy for rapid 
detection of chimerism, as described by S. Banfi et al., Nucleic Acids Res. , 20 , 1814 
(1992) was used. Thirty of the YAC clones were tested using this approach and 
eight (27%) were found to be chimeric. Insert ends isolated from YACs determined 
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to be non-chimeric by the dot blot hybridization approach mapped to 6p22-p23 with 
the exception of the two ends from 198C8 which proved to map to other 
chromosomes. 

Two approaches were used, inverse-PCR (G. Joslyn et al., Cell . 66 . 

5 601-613 (1991)) and Alu-PCR (analogous to that described by D.L. Nelson et al., 
Proc. Natl. Acad. Sci. USA . 86, 6686-6690 (1989)) to isolate YAC ends. In total, 
34 YAC ends were isolated; inverse-PCR yielded 26 ends and ^4/w-vector PCR 
yielded 8 ends. To isolate the left end of the 195B5 YAC we screened a cosmid 
library prepared from this YAC using pYAC4 left end sequences (S.K. Bronson et 

10 al., Proc. Natl. Acad. Sci. USA . £&, 1676-1680 (1991)) as a probe. This approach 
was taken because inverse-PGR yielded an end which was predominantly an AAu- 
containing sequence and Alu-PCR failed in yielding an end. Cosmid clone A32 was 
found to contain the left end of 195B5 and a subclone, 64U, was used to establish an 
STS for further YAC library screenings. 

15 In order to confirm the 6p22-p23 regional origin of all YAC ends or 

subclones, these fragments were used as probes against Southern blots containing 
£coRI-digested DNAs from a somatic cell hybrid retaining 6p (I-7)> from radiation 
reduced hybrids known to retain fragments of 6p (H.Y. Zoghbi et al., Genomics , 2, 
713-720 (1 991)) and from the YAC clones at a particular STS. 

20 

3. Probe content mapping of YACs 

In order to define the degree of overlap between the clones and to 
detect possible rearrangements such as internal deletions of the YACs, a probe 
content mapping strategy was used based on: 1) PCR analysis of all the clones using 

25 all the STSs in the region including both the ones described in Table 6, and those at 
highly informative dinucleotide repeats such as AM10-GA and SB1; and 2) 
hybridization of Southern blots containing iscoRI-digested DNAs from YACs in the 
relevant region, with densely-spaced DNA probes derived from YAC ends, cosmids 
subclones of YACs, or Alu-PCR fragments from YACs. The results of this analysis 

30 for a representative subset of the YACs (32 clones) are summarized in Table 7. 
Thirty-nine YAC clones form an uninterrupted YAC contig from D6S274 to 82G12- 
R (right end of YAC clone 82G12). Other than an internal deletion in one YAC 
(35 IB 10) no other deletions were detected within the resolution of this analysis; 
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furthermore the extent of chimerism for some YAC clones (such as 270D12 and 
140H2) was determined. The centromere-telomere orientation of the YAC contig 
on 6p was determined using both genetic data as well as physical mapping data. 
Using dinucleotide repeats analysis at D6S109, AM10GA, D6S89, and SB1 in the 
5 key individual with recombination event between D6S89 and SCA1 revealed that 
the recombination event occurred between AM10GA and D6S89. Given that 
D6S109 is centromeric to D6S89, the recombination analysis suggests that 
AM10GA is centromeric to D6S89. The centromere-telomere position of SB1 with 
respect to D6S89 could not be determined genetically. 
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TABLE 7. 

Characterization of YA.Cs using 6p22-p23 STSs and YA.C fragments 



g -J < * 

t> r~ _] oo o 
s «n r- «m o 



- J ~ ° 2 2 ^ S $ - 2 ^ H 



YAC q; 7P nrh^i ^= >5 o S m- 2 2 P° <m ~j £ ;=: S OtoO 



149H3 345 N + + - -' - 

60H7 580 N + + + - 

351BI0 330 N + - + - - 

227B1 560 N + + + + - 

172B5 345 Y - -• + + - 

195B5 365 N - .-++._ 

475A6 365 N - + - - 

242C5 340 N - + + + 

A250D5 250 N - + + + 

A23C3 530 Y - - - + 

A18306 120 N - - - + 

B238F12 390 Y - - + + 

A91D2 325 N 

. 191D8 650 N 

379C2 575 N 

C5CI2 75 N 

B2I4D3 200 N 

299B3 375 N 

468D12 280 N 

168F1 400 N . + + + 

270D3 650 Y 

274D12. 240 N 



+ 

+ + + + + 

+ + + + + 

- - + + - 
- + + 

- + + + + 
+ + + + 



+ + 

+ + + 

+ + + 

- + + 



140H2 440 Y - - 

57G3 400 N ... 

401D6 340 N ... 

193B3 850 Y 

228A1 350 Y 

90A12 650 Y 

35E8 400 N 

53G12 370 N 

135F6 400 N 

82G12 380 N + * + 

Note. (+) = present, (-) = absent; Y/N = chimerism is/not detected. YAC ends are identified bv YAC 

names followed by L or R for left or right. 



+ - 

+ + - - 

+ + - - 

+ + - - 

+ + + + 

+ + + + 
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Physical mapping, using both radiation hybrids and YACs, was carried 
out to resolve the centromere-telomere order of the loci. The radiation reduced 
hybrids R17 and R72 are known to contain markers centromeric to D6S89; these 
markers include D6S108 and D6S88 which map centromeric to D6S109. See, H.Y. 
5 Zoghbi et al., Genomics, 2, 713-720 (1991). R72 also retains D6S109, but a small 
gap in R17 was revealed as this radiation hybrid did not retain D6S109, but was 
positive for an end isolated from a YAC at the D6S109 locus. Analysis of the 
radiation reduced hybrids revealed that D6S274 and D6S288 are present in R17, 
R72 and R86, whereas AM10GA, D6S89, and SB1 are present only in R86 

10 (Figure 5). Furthermore, STS content mapping with D6S260 and D6S289, two 
dinucleotide repeats that are telomeric to B6S288 (J. Weissenbach et al., Nature . 
359 794-801 (1992)), revealed that D6S260 is present in the same YACs as D6S89 
and SB1 (379C2 and 168F1), and that D6S289 is present in 57G3 and 35E8 two 
YACs derived using the FLB1 and AMI 2 STS respectively. These data, confirm 

15 that the order of the loci as well as the centromere-telomere orientation of the YAC 
contig presented in Figure 6 is correct. 

Figure 6 shows a selected subset of YAC clones which span the entire 
contig from D6S274 to 82G12-R. A minimal number of 8 YACs spans this region. 
The positions of the STSs which were used to isolate the YACs are also shown. 

20 Based bn the size of the YACs and the degree of overlap, this contig is estimated to 
span 2.5 Mb of genomic DNA in 6p22-p23 with D6S89 located approximately in 
the middle. 

4. Delineating the SCA1 critical region 

25 Genetic studies using recently identified dinucleotide repeats 

(AM10GA and SB1) showed that SCA1 maps centromeric to the D6S89 locus very 
close to AM10GA (peak load score of 42.1 at a recombination frequency of zero) in 
nine large SCA1 kindreds (Example 1, above). Thus D6S89 is the closest flanking 
marker at the telomeric end. Previously, the closest flanking marker at the 

30 centromeric end was D6S109, a dinucleotide repeat estimated to be 6.7 cM 
centromeric to D6S89. To identify a closer flanking marker at the centromeric end, 
we mapped D6S260, D6S274, D6S288 and D6S289, four dinucleotide repeat- 
containing markers known to map 6p22-p23 (J. Weissenbach et al., Nature . 359 
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794-801 (1992)). The regional mapping of these markers was done using radiation 
reduced hybrids and the YAC clones isolated from this region. These data revealed 
that D6S274 and D6S288 map centromeric to AM10GA as evident by amplification 
of DNA from radiation hybrids R17 and R72 which are known to be centromeric to 

5 AM10GA. Genotypical analysis of the DNAs from individuals with key 
recombination events between D6S109 and D6S89 as well as from affected and 
normal individuals (to establish chromosomal phase) from the five SCA1 kindreds 
(MN-SCA1, MI-SCA1, TX-SCA1, M-SCA1 and MS-SCA) was carried out. This 
analysis revealed no recombination between D6S288 and SCA1. A single 

10 recombination event between D6S274 and D6S288 was detected in individual MN-1 
from the MN-SCA1 kindred (Figure 7); this individual was one of the six 
individuals identified above as having a recombination event between SCA1 and 
D6S109. This analysis allowed us to identify D6S274 as the closest flanking 
marker at the centromeric end. These data combined with that discussed above 

15 determined that the SCA1 critical region maps between D6S274 and D6S89. This 
candidate region (1.2 Mb) is cloned in a minimum of four overlapping and non- 
chimeric YACs as shown in Figure 8. 

5. Long-range restriction mapping 

20 In order to have an estimate of the size of the YAC contig in the SCA1 

critical region we performed long-range restriction analysis on YACs from this 
region. The YACs used for this analysis included: 227B1, 60H7, 351B10, 172B5, 
195B5, A250D5, 379C2, and 168FL The following rare-cutter restriction enzymes 
were used: NotI, BssHlI, Nrul, Mlu\ 9 and SacIL Restriction fragments separated by 

25 PFGE and transferred onto nylon membranes, were detected by sequential 
hybridizations of the filter to several DNA probes which included: DNA probes 
specific for the left and right arm of the pYAC4 vector; insert termini for internal 
YAC clones; internal probes and cosmid subclones; and an ^/n-specific probe. The 
position and names of all the probes used in the long-range restriction analysis is 

30 shown in Figure 8. Based on this analysis the internal deletion for YAC 35 IB 10 
was confirmed. The extent of overlap between the YAC clones was determined. 
The size of the critical SCA1 region was estimated to be 1.2 Mb. Internal deletions 
and/or other rearrangements could not be excluded for the areas where a single YAC 
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was analyzed by restriction enzyme analysis. These include approximately a 220 kb 
region within YAC 195B5 and a 335 kb region within YAC 379C2. 

III. Expansion of an Unstable Trinucleotide Repeat in SCA1 
5 A. Methods 

] 1 Screeninp for t rinucleotide repeats 

Genomic DNA from YACs was partially digested with Mbol and 
cloned into cosmid vector super CosI (Stratagene) following the manufacturer's 
protocol. Clones containing human inserts were identified by hybridization with 

10 radiolabeled human DNA and were arrayed on a gridded plate. Filter lifts of cosmid 
clones from YAC227B1 were SCTeened for the presence of trinucleotide repeats by 
hybridization to [y- 32 P] end-labelled (GCT), oligonucleotide. In a parallel 
experiment, a mixture of 10 oligonucleotides representing the various permutations 
of trinucleotide repeats were end-labelled and hybridized to a Southern transfer of 

15 iscoRI-digested cosmids from YACs 195B5 and A250D5. Hybridizations were 
done in a solution of 1 M NaCl, 1% sodium dodecyl sulfate (SDS) and 10% (w/v) 
dextran sulphate. Filters were washed in 2xSSC (lxSSC is 0.15 M sodium chloride 
and 0.015 M sodium citrate), and 0.1% SDS at room temperature for 15 minutes, 
followed by a 15 minute wash at room temperature in a solution prewarmed to 

20 67°C. Both strategies identified several positive clones, 22 of which were 
overlapping and contained the same 3.36-kb EcoKL fragment which hybridized to 
the (GCT) 7 probe and ultimately proved to have the CAG repeat by sequence 
analysis. 

25 2. Genomic digests and Southern blots 

Genomic DNAs were digested with Taql (Boehringer Mannheim, 
Indianapolis, IN) or BstNl (New England Biolabs, Beverly, MA) according to the 
manufacturers recommendations. Southern blotting was done following standard 
protocols. 

30 

3. DNA sequencing 

To determine the DNA sequence in the region containing and flanking 
the CAG trinucleotide repeats, clone pGCT-7, containing the 3.36 kb-EcoKL 
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fragment, was subcloned. A 400-bp fragment with CAG trinucleotide repeats was 
generated from pGCT-7 by Sau3Al digestion and subcloned into the BamUl site of 
pBluescriptKS- (Stratagene, La Jolla, CA) (clone pGCT-7.sl). In addition, pGCT-7 
was digested with PstI to remove 1.3 kb of DNA and recircularized for 

5 transformation (clone pGCT-7.p2). The position of the trinucleotide repeats was 
determined by PCR using (GCT) 7 oligonucleotide and one of the flanking 
sequencing primers as PCR primers. Initial results indicated that the CAG 
trinucleotide repeats were on the reverse primer strand, about 1.3 kb from the 
reverse primer, that is, 400 bp from the Pstl site. DNA sequencing was performed 

10 by di-deoxynucleotide chain-termination method using Sequenase and ATaq Cycle- 
Sequencing kit (United States Biochemical, Cleveland, OH). Both universal (-40) 
and reverse primers were used for clone pGCT-7.sl, while only universal (-40) 
primer was used for sequencing pGCT-7.p2. 

15 4. RT-PCR and Northern analysis 

Total RNA was extracted from lymphoblastoid cells using 
guanidinium thiocyanate followed by centrifugation in a cesium chloride gradient. 
Poly(A)"RNA was selected using Dynabeads oligo(dT)25 from Dynal (Great Neck, 
NY): First strand cDNA synthesis was carried out using MMLV reverse 

20 transcriptase (BRL, Gaithersberg, MD). RT-PCR was carried out using hot start 
PCR with three cycles of: 97°C for 1 minute, 59°C for 1 minute, and 72°C for 1 
minute for the Prel and Pre2 primer set. Following that 33 cycles of 94°C for 1 
minute, 57°C for 1 minute, and 72°C for 1 minute were carried out. For the Repl 
and Rep2 primer pair the same PCR cycling conditions were followed at lower 

25 annealing temperatures of 57°C and 55°C respectively. The RT-PCR products were 
analyzed on 6% Nusieve agarose gel. The northern blot containing various human 
tissues was purchased from Clonetech (Palo Alto, CA). 

5, PCR Analysis 

30 Fifty ng of genomic DNA was mixed with 5 pmol of each primer 

(CAG-a/GAG-b or Rep-l/Rep-2) in a total volume of 20 \i\ containing 1.5 mM 
MgCl 2 , 300 \iM dNTPs (1.25 mM MgCl 2 and 250 dNTPs for Rep-l/Rep-2 
primers), 50 mM KC1, lOmM Tris-HCl pH 8.3, and 1 unit of Amplitaq (Perkin 
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Elmer, Norwalk, CT). For the CAG-a/CAG-b primer pair [a- 32 P]dCTP was 
incorporated in the PCR reaction, for Rep-l/Rep-2 primer pair the Rep-1 primer was 
labeled at the 5' end with [y- 32 P]dATP. Formamide was used at a final 
concentration of 2% when using the Rep-l/Rep-2 primer pair. Samples, overlaid 

5 with mineral oil, were denatured at 94°C for 4 minutes followed by 30 cycles of 
denaturation (94°C, 1 minute), annealing (55°C, 1 minute), and extension (72°C, 2 
minutes). Six microliters (|il) of each PCR reaction was mixed with 4 ul formamide 
loading buffer, denatured at 90°C for 2 minutes, and electrophoresed through a 6% 
polyacrylamide/7.65 M urea DNA sequencing gel. Allele sizes were determined by 

10 comparing migration relative to an Ml 3 sequencing ladder. 

R. Results 

1 C llnninfr of the C AG repeat region in SC A 1 

As discussed above, in efforts to clone the SCA1 gene, key 

15 recombination events were analyzed using several dinucleotide repeat 
polymorphisms mapping to 6p22-p23 to identify the nunimal region likely to 
contain the SCA1 gene. This analysis revealed that there were no recombination 
events between SCA1 and the centromeric marker D6S288 in five large kindreds or 
between SCA1 and the telomeric marker AM10GA in nine large kindreds. A single 

20 recombination event was detected between D6S274 and D6S288 identifying the 
closest flanking marker at the centromeric end to be D6S274. At the telomeric end, 
a single recombination event was detected between AM10GA and D6S89 and 
identified the latter as the flanking marker. A yeast artificial chromosome (YAC) 
contig extending from D6S274 to D6S89 and spanning the entire SCA1 candidate 

25 region was developed. A subset of the YAC clones encompassing this region is 
shown in Figure 9. Long-range restriction analysis determined the size of the SCA1 
candidate region to be approximately 1.2 Mb. Cosmid libraries were constructed 
from YACs 227B1, 195B5, A250D5, and 379C2. Arrays of cosmid clones 
containing human inserts were hybridized with an oligonucleotide consisting of 

30 tandemly repeated CAG, as well as with oligonucleotides containing other 
trinucleotide repeats. Several hybridizing cosmid clones were identified, 23 of 
which were positive for the CAG repeat and mapped to the region between D6S288 
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and AM10GA (Figure 9). All 22 of these clones shared a common 3.36-kb EcoKL 
fragment that specifically hybridized to the CAG repeat. 

2. Variability of the CAG R epeat Using Southern Analysis 
5 To test the genetic stability of this repeat in SCA1, we used Southern 

blotting analysis to examine families with juvenile onset SCA1. A two-generation 
reduced pedigree from the TX-SCA1 family is shown in Figure 10a. Paternal 
transmission of SCA1 with an expansion of a Taql fragment was noted. A 2830-bp 
fragment was detected in DNA from the unaffected spouse and on the normal 

10 chromosome from SCA1 patients, whereas a 2930-bp fragment was found in DNA 
from the affected father (onset at 25 years) and a 3 000-bp fragment was detected in 
DNA from his affected child with an onset at 4 years. In a second SCA1 kindred, 
family MN-SCA1 (Figure 10b), two offspring inherited SCA1 from their father and 
differed in their age at onset (25 years and 9 years). These individuals also differ in 

15 the size of the amplified Taql fragment they inherited from their affected father, 
2900-bp and 2970-bp, respectively. 

Enlargement of the (CAG) n -containing fragment on SCA1 
chromosomes from the same TX-SCA1 juvenile onset family was also demonstrated 
by Southern analysis following BstNl digestion. The BsfNl fragment is 530-bp on 

20 normal chromosomes, is 610-bp in the SCA1 affected father, and is 680-bp in the 
affected juvenile onset offspring (Figure 10c). In each of these families, 
nonpaternity was excluded by genotypic analysis with a large number (greater than 
10) of dinucleotide repeat markers. In addition, the size of the (CAG) n -containing 

Taql fragment in DNA from 30 unaffected spouses was compared to the sizes of the 
25 repeat containing Taql fragment in DNA from 62 individuals affected with late- 
onset SCA1. The affected individuals are from five different SCA1 families: LA- 
SCA1, MI-SCA1, MN-SCA1, MS-SCA1, and TX-SCA1. In all 30 unaffected 
spouses fragment sizes were approximately 2830-bp and no expansions or 
reductions were detected with transmission to offspring. In contrast, DNA from 58 
30 of the 62 SCA1 affected individuals contained detectably expanded Taql fragments 
ranging in size from 2860-bp to 3000-bp in addition to the 2830-bp fragment. The 
DNAs from the remaining four individuals were found to have an expansion when 
analyzed by polymerase chain reaction (PCR). The expanded fragment always 
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segregated with disease, and in some cases the fragment expanded further in 
successive generations. In the juvenile cases the expanded restriction fragment was 
larger than that in the affected parent (uniformly the father in the cases analyzed) 
supporting the conclusion that a DNA sequence expansion is the mutational basis of 
5 SCA1. 

T Genomic D NA analysis of repeat regions 

To identify the region involved in the DNA expansion, a 500-bp 
(CAG) n -containing subclone of the 3.36-kb EcoRI fragment was sequenced, as was 
10 the entire 3.36-kb fragment (Figure 1). This normal allele demonstrated 30 CAG 
repeat units. In two of the repeat units (position 13 and 15) a T was present instead 
ofaG. 

The expansion of the trinucleotide repeat was observed in all affected 
individuals examined by PCR from five different kindreds representing at least two 
15 ethnic backgrounds, American Black and Caucasian. Genotypic analysis using 
DNA markers that are very closely linked to SCA1 (D6S274, D6S288, AM10GA, 
D6S89 and SB1) revealed that there are four haplotypes segregating with disease 
among the five families analyzed. 

20 4. The t rinucleotide repeat is transcribed 

To test whether the CAG repeat lies within a gene, reverse 
transcription-PCR (RT-PCR) was performed using primers immediately flanking 
the repeat (Repl and Rep2) as well as primers which amplify a sequence 
immediately adjacent to the repeat (Prel and Pre2). The RT-PCR analysis confirms 

25 that the CAG repeat is present in mRNA from lymphoblasts. Furthermore, northern 
blot analysis of human poly(A)"RNA from various tissues, using a 1.1 kb subclone 
(C208-L1) from the 3.36-kb EcoRI fragment as a probe, identified a 10 kb transcript 
which is expressed in brain, skeletal muscle, placenta and to a lesser extent in 
kidney, lung and heart. The expression of this transcript is considerable in skeletal 

30 muscle. When the 3.36-kb EcoRI fragment was used as a probe on the northern blot 
the same size transcript was detected. 
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S PCR analysis of the CAG repeat 

To confirm that the CAG repeats were involved in the observed length 
variation, we analyzed the size of PCR-amplified fragments in 45 unaffected 
spouses and 31 SCA1 affected individuals using synthetic oligonucleotides that 
5 flank the CAG repeat. One pair of primers (CAG-a/CAG-b) was located within 
9-bp of the repeats and identified length variation indicating that the CAG repeats 
are the basis of the variation. 

Normal individuals displayed 1 1 alleles ranging from 25 to 36 repeat 
units (Table 8). Heterozygosity in normal individuals was 84%. Examination of 

10 this sequence in 31 individuals affected with SCA1 demonstrated that each was a 
heterozygote with one allele within the size range seen in the normal individuals and 
a second expanded allele within a range of 43 to 81 repeat units (Figure 11). Late 
onset SCA1 individuals showed at least 43 repeats, while 59-81 units were found in 
the juvenile cases. Figure 12 depicts correlation between the age-at-onset and the 

15 number of the repeat units. A linear correlation coefficient (r) of -0.845 was 
obtained indicating that 71.4% (r 2 ) of the variation in the age-at-onset can be 
accounted for by the number of (CAG) n repeat units. The largest trinucleotide 
repeat expansion was noted in SCA1 patients with juvenile onset who typically had 
a more rapid course. It is of interest that all of these patients were offspring of 

20 affected males, which is reminiscent of Huntington disease where there is 
preponderance of male transmission in juvenile cases. 
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Sequence analysis of the fragment containing the CAG repeat 
indicated that there are several extended open reading frames. Translation of the 
repeat in one of these frames (389-bp) would encode polyglutamine. 

5 Xahk_& 

Comparison of the number of CAG repeat units 
on normal and SCA1 chromosomes 



Number 
of 


Normal Chromosomes 


SCA1 Chromosomes 


Repeats 


Number 


Frequency 


Number 


Frequency 




0 


0 


4 


0.13 


50-59 


0 


0 


17 


0.55 


43-49 


0 


0 


10 


0.32 


37-42 


0 


0 


0 


0 


35-36 


1 


0.01 


0 


O 


30-34 


49 


0.55 


0 


0 


<29 


40 


0.44 


0 


0 


TOTAL 


90 


1.00 


31 


1.00 



25 IY, Isolation of SCA1 cDNA 
A. Methods 

I ■ Screening of cPNA libraries, 

Three cDNA libraries were screened: a human fetal brain library 
from Stratagene (La Jolla, CA), a human fetal brain library constructed in X-Zap II 
30 with the inserts cloned into the Notl restriction site (provided by Dr. Cheng Chi Lee 
at Baylor College of Medicine), and an adult cerebellar cDNA library from 
Clonetech (Palo Alto, CA). The libraries were plated on 1 50 cm plates at a density 
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of 50,000 pfu per plate using bacterial strain LE392 (ATCC number 33572). 
Hybond-N filters (Amersham, Arlington Heights, IL) were used to carry out plaque 
lifts. The fragments used as probes in the first screening included a mixture of two 
polymerase chain reaction (PCR) products obtained by using the primers Repl and 
5 Rep2 (Figure 3) immediately flanking the repeat and the primers Prel and Pre2 
(Figure 3) which amplify a sequence immediately adjacent to the repeat, and a 1.1 
kb subclone of the 3.36-kb EcoRI fragment (Figure 1). The 1.1 kb fragment (C208- 
1.1) is located 540 bp 3' to the CAG repeat A 9-kb EcoRl genomic fragment 
derived from the same cosmids containing the CAG repeat was also used in this 

10 screening. Subsequent rounds of screening were carried out on the same libraries 
using as probes cDNA clones 31-5, 3J, 3c7-2 and 3e7 (Figure 13). Genomic and 
cDNA probes were labeled using the random priming technique described in A.P. 
Feinberg et aL, Anal. Biochem. . 137 T 266-267 (1984). Repetitive sequences were 
blocked as described in P.G. Sealy et aL, NucL Acids Res.. 12, 1905-1922 (1985). 

15 Briefly, the probes were reassociated with a large excess of shear human placental 
DNA. The nonrepetitive regions remained single-stranded and no separation of the 
single-stranded fragments from the reassociated fragments was necessary in order to 
allow the signal from low copy number components to be detected in subsequent 
transfer hybridizations. Hybridization of the filters was then carried out following 

20 standard protocols as described in H.Y. Zoghbi, et aL, Are, J, Hum, Genet, 42, 877- 
883 (1988). 

2. DNA sequencing and sequence analysis- 

Shotgun libraries were constructed in Ml 3 as described in A.T. 

25 Bankier, et aL, Meth. EnzvmoL . 155, 55-93 (1987) for each of the following cDNA 
clones: 8-8, 31-5, 3c5, 3c7-l, 3J, 3c7-2, 3c7 (Figure 13). Twenty to thirty M13 
subclones were sequenced for each cDNA clone using an Applied Biosystem, ABI 
370A, automated fluorescent sequencer, as described in R. Gibbs, et aL, Proc. Natl. 
Acad. Sci. U.S.A. , 1919-1923 (1989). Some cDNA clones (8-9b, 8-9a, AX1, 

30 B21, Bl 1, 3c28) were partially sequenced manually using a Sequenase sequencing 
kit (USB, Cleveland, OH) on double-stranded templates, according to the 
manufacturer's recommendations. The sequence coverage in terms of numbers of 
cDNA/genomic clones analyzed was 3-4X in the coding and 5'UTR and 2X in the 
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3'UTR. All RT-PCR, 5'-RACE-PCR and inverse-PCR products were sequenced 
manually after subcloning into S>wa/-digested pBluescript SK- plasmid (Stratagene, 
La Jolla, CA) modified using the T-vector protocol as described in D. Marchuk et 
al„ Nucl. Acids Res., Jj), 1154 (1990). Use of this protocol facilitates cloning. 
5 Briefly, Taq polymerase ordinarily causes a template-independent addition of 
adenosine at the 3' end of the PCR product, making blunt end ligations difficult. In 
the T-vector protocol, a thymidine is added to the 3' end of a digested plasmid. The 
result is a one-base sticky end complementary to the 3' adenosine in the PCR 
product, which greatly increases cloning efficiency. 
10 Data base searches were carried out using the GCG software package 

(Genetics Computer Group, Madison, WI) and-the BLAST network sen/ice from the 
National Center for Biotechnology Information (S.F. Altschul, et aL, J. Mol. Biol.. 
215, 403-410 (1990)). The sequence of the SCA1 transcript has been deposited in 
Genbank, accession number X79204. 

15 

3. Northern blot. R T-PCR and genomic PCR analyses. 

The northern blot of poly-(A) + RNA from various human tissues and 
the poly-(A) + RNA from adult human cerebellum were purchased from Clonetech 
(Palo Alto, CA). Poly-(A) + RNA from human lymphoblastoid cells was prepared by 

20 first extracting total RNA using guanidinium thiocyanate, followed by 
centrifugation in a cesium chloride gradient (P. Chomczynski et aL, Anal. Biochem .. 
162 . 156-159 (1987)). Poly-(A) + RNA was selected using Dynabeads oligo (dT) 25 
from Dynal (Great Neck, NY). First strand randomly primed cDNA synthesis was 
carried out using MMLV (murine maloney leukemia virus) reverse transcriptase 

25 (BRL, Gaithersberg, MD). This was conducted in a 20 \x\ reaction mixture 
containing 3 jig RNA, first strand buffer (50 mM Tris-HCl, pH 8.3, 75 mM KC1, 3 
mM Mg Cl 2 ), (BRL, Gaithersberg, MD), 10 mM dithiothreitol (BRL, Gaithersberg, 
MD), 1 \xM 3' end primer, 0.5 units RNasin (Promega, Madison, WI), 5.0 units 
MMLV reverse transcriptase (BRL, Gaithersberg, MD), 250 |iM each 

30 deoxynucleotide triphospate: dGTP, dATP, dCTP, d l I P. The mixture was 
incubated for 20 minutes at 37°C then put on ice. A 10 \x\ aliquot was used for the 
PCR reaction. First strand randomly primed cDNA from human brain, liver and 
adrenal were provided by Dr. G. Borsani (Baylor College of Medicine). 
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RT-PCR for detection of alternative splicing was carried out with 
primers 9b and 5R and with primers 5F and 5R (Figure 15) under the following 
conditions: an initial denaturation step at 94°C for 5' followed by 30 cycles of 94°C 
for 1 minute, 60°C for 1 minute and 72°C for 2 minutes. The reaction mixture 
5 contained 10 |xl cDNA, PCR buffer (50 mM KCL, 10 mM Tris-HCl, pH 8.3, 1.25 
mM MgCl 2 ), 1 fxM of the relevant 3* primer (primer 5R), 2% formamide and 1.25 
units Amplitaq enzyme (Perkin Elmer, Norwalk, CT). 

RT-PCR on lymphoblastoid cell lines with primers Repl and Rep2 
for detection of expression of SCA1 mRNA was carried out using "hot start" PCR 

10 with three cycles of: 97°C for 1 minute, 57°C for 1 minute and 72°C for 1 minute. 
Following thai 33 cycles of 94°C for 1 minute, 55°C for 1 minute and 72°C for 1 
minute were carried out. Twenty microliters of the PCR reactions was then resolved 
on a 2% agarose gel (2 g Ultrapure agarose (BRL, Gaithersberg, MD) in 40 mM 
Tris-acetate, 1 mM EDTA, pH 8.0) and blotted onto Sureblot membrane (Oncor, 

15 Gaithersburg, MD). The filter was hybridized with a (GCT) 7 oligonucleotide 
end-labeled with y- 32 P-ATP. Hybridizations were done in a solution of 1 M NaCl, 
1% sodium dodecyl sulfate (SDS) (Sigma Chemical Company, St. Louis, MO) and 
10% (w/v) dextran sulphate (Sigma Chemical Company, St. Louis, MO). Filters 
were washed in 2 x SSC (1 x SSC is 0.15 M sodium chloride and 0.015 M sodium 

20 citrate), and 0.1% SDS at room temperature for 15 minutes, followed by a 1 5 minute 
wash at room temperature in a solution prewarmed to 67°C. 

B. Results 

Two human fetal brain cDNA libraries were screened using as probes 
25 various DNA fragments from the cosmid clone shown to contain the CAG repeat. 
Five cDNA clones were identified; these included clone 31-5 containing the CAG 
repeat, and clone 3 J which was found not to overlap with 31-5 (Figure 13). 
Northern blot analysis revealed that clones 31-5 and 3 J identified the same 11 -kb 
transcript detectable in all tissues examined (Figure 14). Accordingly, the same two 
30 human fetal brain cDNA libraries and a human adult cerebellar cDNA library were 
used for several rounds of screening in order to obtain the full length transcript. As a 
result, 22 cDNA clones were isolated and characterized by sequence and PCR 
analyses to assemble a contig spanning the SCA1 transcript. Twelve of the phage 
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clones spanning the cDNA contig are shown in Figure 13. These clones were 
sequenced allowing the assembly of the entire sequence of the SCA1 cDNA which 
spans 10,660 bp (Figure 15). 

Sequence analysis revealed a coding region of 2448 bp starting with 
5 a putative ATG initiator codon at base 936 located within a nucleotide sequence that 
fulfills Kozak's criteria for an initiation codon (M. Kozak, J, Cell, BloL 229- 
241 (1989)). An in-frame stop codon is present 57 bp upstream of that ATG in three 
independent cDNA clones as well as in genomic DNA. Furthermore, both the ATG 
at the beginning of the coding region and the upstream stop codon have been found 

10 in the murine homologue of SCA1 in the murine fetal brain library (Stratagene, La 
Jolla, OA). The SCA1 gene therefore encodes a polypeptide of about 816 amino 
acids, with an expected size of 87 kD, designated ataxin-1. However, one cannot 
exclude the possibility that the coding region begins at any of the other ATGs, 
located downstream of the first methionine, which would result in a smaller protein. 

15 The CAG repeat is located within the coding region 588 bp from the 

first methionine and encodes a polyglutamine tract. The open reading frame ends 
with a TAG stop codon at base 3384. Therefore, this transcript has a 5' untranslated 
region (5'UTR) of 935 bp and a 3' untranslated region (3'UTR) of 7277 bp. The 
transcript ends with a tail of 57 adenosine residues; a polyadenylation signed, 

20 AATAAA, is found 23 nucleotides upstream of the poly(A) tail. Homology 
searches using both the DNA sequence of the coding region and the predicted 
protein sequence (lacking the CAG repeat and the polyglutamine tract, respectively) 
revealed no significant homology with other known proteins in the data base. 
Analysis of the sequence of ataxin-1 failed to reveal the presence of any strong 

25 phosphorylation sites as well as any specific motifs such as DNA binding or RNA 
binding domains. The putative secondary structure of this protein is compatible 
with that of a soluble protein as no hydrophobic domains were identified. A DNA 
sequence data base search revealed an identity between 380 bp in the 3'UTR of the 
SCA1 transcript and an expressed sequence tag (EST04379) isolated from a human 

30 fetal brain cDNA library (M.D. Adams, M.D. et al., Nature Genet .. 4, 256-267 
(1993)). 
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V, Organization of the SCA1 Transcript; Evidence for Alternative 

Splicing iff theSTJTR 
A. Methods 

1 5-RACE-PCR 

5 First strand cDNA was prepared from 1 mg of po!y-(A) + RNA from 

human adult cerebellum (Clonetech, Palo Alto, CA) using the primer 5R (Figure 15) 
as described in Example IV* 5'-RACE-PCR was carried out as described in MA. 
Frohman in PCR Protocols. A Guide to Met hods and Applications: M.A. Innis, et 
al., Eds.; Academic Press: San Diego (1990) using SCA1 primers 5a and X4-1 

10 (Table 9) as specific primers. The product was then electrophoresed through a 1 .2% 
agarose gel, blotted onto SureBlot hybridization membrane (Oncor. Gaithersburg. 
MD) as described in Example II above, and then, to test the specificity of the 
product, hybridized to a SCA1 specific probe represented by a PCR product 
spanning 1 1 8 bp between primer 9b in exon 1 and primer X3-1 (Table 9) in exon 3. 



15 

Table 9. 

Primer sequences for inverse-PCR 



Exon 


Primer 1 


Primer 2 


2 


X2-1 (181-164) 
GTAGTAGTTTTTGTGAGG 


X2-2 (185-203) 
CACCAAGCTCCCTGATGGA 


3 


X3-1 (246-229) 
GCTTGAATGGACCACCCT 


X3-2 (277-296) 
ATCTCCTCCTCCACTGCCAC 


4 


X4-1 (347-329) 
AGACTCTTTCACTATGCTC 


X4-4 (407-425) 
TTCAGCCTGCACGGATGGT 


5 


5a (482-463) 

TGGCAGTGGAGAATCTCAGT 


5-2 (519-538) 

TGCTGCAAGGAACTGATAGC 


6 


10a (598-580) 

AATGGTCTAATTTCTTTGG 


10b (607-625) 

GAGAAAGAAATCGACGTGC 


7 


6-1 (714-695) 

ACAGGCTCTGGAGGGCTCCT 


X5-2 (723-742) 

TCCATGGTGAAGTATAGGCT 


9 


9-1 (2919-2900) 
AGCAGGATGACCAGCCCTGT 


9-2 (2939-2957) 
GCTCTTTGATTTGCCGTGT 


All primers are read in the 5' to the 3' direction. Numbers in parenthesis represent 
the coordinates of each primer within the SCA1 cDNA sequence (Figure 15). 
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B. Results 

To characterize the genomic region flanking the CAG repeat, the 
3.36-kb EcoBl genomic fragment known to contain this repeat was completely 
sequenced. Alignment of this genomic sequence with the cDNA sequence allowed 
5 us to determine that the 3.36-kb EcoKL fragment contains a 2080-bp exon which has 
160 bp of 5'UTR, the first potential initiation codon and the first 1920 bp of the 
coding region. The rest of the coding region lies within the next downstream exon 
as detected by PCR analysis on genomic DNA. The last coding exon, which maps 
to a 9-kb EcoRI fragment in genomic DNA also contains 7277 bp of 3'UTR for a 

10 total length of 7805 bp (Figure 16a). 

Evidence for alternative splicing in the 5'UTR was initially 
suggested based on the hybridization pattern of the two most 5' cDNA clones, 8-8 
and 8-9b (Figure 13) to Southern blots containing iscoRI-digested genomic DNA 
from total human DNA and YACs spanning the SCA1 region. At least three 

15 strongly hybridizing fragments in addition to the 3.36-kb EcoKL fragment were seen. 
As neither of the cDNA clones contains an EcoRI site, this result suggested the 
presence of several exons in the 5'UTR of the SCA1 transcript. Given these data 
and the unusual length of the 5'UTR, this region was characterized in more detail. 

Alignment analysis of the sequence of clones 8-8 and 8-9b revealed 

20 the presence of two different 5' sequences diverging at basepair 322. This result 
was highly suggestive of alternative splicing. In order to test this hypothesis, 
reverse transcription-PCR (RT-PCR) was performed on mRNA from cerebellar 
tissue using the primers indicated in Figure 15. When the primers 9b (specific for 
8-9b clone) and 5R (present in both clones) were used in the RT-PCR analysis three 

25 products were obtained: one of the expected size (246 bp) and at least two fragments 
of larger size (Figure 166). The same result was obtained when RT-PCR was 
carried out on liver, adrenal, brain and lymphoblast cDNAs. The various RT-PCR 
products were cloned and sequenced. Sequence analysis of all these products and 
comparison with the sequence of phage clones 8-8 and 8-9b confirmed that they 

30 were the result of alternative splicing. Figure 16a shows the structure of all the 
cDNA clones which contain the 5' exons of the SCA1 gene and depicts the splice 
variants. Based on sequence analysis of three cDNA clones and characterization of 
cerebellar RT-PCR products, five exons (exons 1 through 5) were identified and 
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their borders in the transcript were determined. Exons 2, 3 and 4 are alternatively 
spliced in the clones examined and in cerebellar tissue, whereas exon 5 was present 
in all the cDNA clones and RT-PCR products. 

Rescreening of cDNA libraries with clones 8-8 and 8-9b as probes 
5 did not yield any additional cDNA clones. To identify additional alternatively 
spliced exons in the 5'UTR and to confirm initial results, 5'-RACE-PCR was 
carried out on reverse transcribed cerebellar mRNA using primers from the 5' end of 
exons 5 and 4. A 218-bp product was identified and its specificity was confirmed 
by Southern analysis using an internal PCR product as probe. Sequence analysis of 
10 the 5'-RACE-PCR product, furthermore, confirmed the alternative splicing of two 
exons (2 and 3) and allowed the identification of -an- -additional- 127 bp at the 5' end 
of this gene (Figure 1 6a). 

VI. Identification of Intro n-Exon Boundaries and Determination 
15 of the Genomic Structure of SCA1 

A. Methods 

1 Identification of intron-exon boundaries 

'The boundaries of exons 2-9 were identified by inverse-PCR. To 
carry out inverse-PCR, YAC agarose plugs were digested to completion as 

20 described in M.C. Wapenaar, et al., Hunv Mol. Genet,. 2, 947-952 (1993) using 
frequent-cutter restriction enzymes such as Sau3aL, Taql, Haelll and Mspl 
purchased from Boehringer Mannheim Biochemicals (Indianapolis, IN) and used as 
recommended by the manufacturer. The plugs were then digested with p agarase I 
(USB, Cleveland, OH) following the manufacturer's recommendations and 

25 subsequently phenol-chloroform (Boehringer Mannheim Biochemicals, 
Indianapolis, IN) extracted, precipitated with ethanol and resuspended in 12 ml of 
TE (TE: 10 mM Tris-HCl, 1 mM EDTA) pH 8. Fifty ng of DNA from each digest 
was then circularized according to the published protocol of J. Groden et al., Cell . 
(>6 T 589-600 (1991). Diverging PCR primers were designed within the cDNA and 

30 used on the circularized product under the amplification conditions described in J. 
Groden et al., Cell . ££, 589-600 (1991). PCR products were then subcloned and 
sequenced as described in Example II, above. Inverse-PCR identified all 
intron/exon boundaries except the boundary of exon 1 . Accordingly, a 9-kb EcoRl 
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genomic fragment found to contain exon 1 was subcloned from a cosmid derived 
from YAC 227B1. (Example II). This subclone was subsequently partially 
sequenced to identify the boundary of exon 1. 

5 % Ma pping of cDNA cl ones to the YACs and cosmids. 

Southern blots containing £c<?RI-digested DNAs from YACs 
spanning the SCA1 critical region as well as Southern blots containing DNAs from 
the YACs digested with rare-cutter enzymes (see previous section) were hybridized, 
using the standard protocol described in FLY. Zoghbi et al., Am. J. Hum. Genet.. 42, 

10 877-883 (1988), to various SCA1 cDNA clones and to all the genomic fragments 
containing the intron-exon boundaries. Briefly, restriction fragments were separated 
by electrophoresis on 0.7% agarose gels, denatured and transferred to Nytran 
(Schliecher and Schuell, Keene, NH) filters. Probes were 32 P-labeled using the 
oligohexamer labeling method (A.P. Feinberg et al, Anal. Biochem . T 132 . 6-13 

15 (1983)). After hybridization the filters were washed and autoradiography was 
performed, as described in Zoghbi et al., Am. J. Hum. Genet. . 42, 877-883 (1988). 

Bx_ Resnlts 

Complete sequencing of the 3.36-kb EcoRl fragment provided the 
20 intron-exon boundaries for the 2080-bp exon containing most of the coding region 
(Figure 17). In order to determine the actual number of exons and to obtain all of 
the intron-exon boundaries, an inverse-PCR strategy was adopted using two 
overlapping YAC clones, 227B1 and 149H3, known not to contain any 
rearrangements (see Example II). A total of nine exons, seven of which are in the 
25 5'UTR, were identified and splice junctions for exons 1 through 9 were subcloned 
and sequenced (Figure 17). The schematic on top of Figure 16a shows the nine 
exons and their respective sizes. In the 5' untranslated region, alternative splicing 
involves exons 2, 3 and 4, but not exons 5, 6 and 7 in over 5 phage cDNA clones 
analyzed. The putative exon 1 encompasses 157 bp and hybridizes very strongly to 
30 an EcoRI fragment derived from hamster genomic DNA. 

To study the genomic organization of the SCA1 gene, ten cDNA 
clones as well as genomic fragments containing the splice junctions for all the exons 
were mapped by Southern analysis and localized on a long range restriction map of 
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four overlapping YAC clones spanning the SCA1 critical region (Figure 18). This 
analysis revealed that the gene spans at least 450 kb of genomic DNA and that the 
putative first exon maps to a genomic fragment containing a hypomethylated CpG 
island. Detailed restriction analysis of the intron between the two coding exons (8 
5 and 9) revealed that this intron is approximately 4.5-kb in length. The sizes of the 
remaining introns were estimated from the long range restriction map and by PCR 
analysis and ranged from 650 bp (intron 2) to nearly 200 kb (intron 7) (Figure 1 8). 

VII. Re pression of the SCA1 mRNA in SCA1 Patients 

10 As a first step toward understanding the mechanism by which the 

expansion of a trinucleotide CAG repeat leads to neurodegeneration in SGAI, the 
level of transcription of SCA1 from the expanded alleles in patients was 
investigated. RT-PCR was carried out with primers Repl and Rep 2 which flank the 
CAG repeat as described in Example V using lymphoblastoid mRNAs from SCA1 

15 patients with repeat sizes ranging from 43 to 69. This analysis revealed that mRNA 
was expressed from both the normal allele and the expanded allele (Figure 19). 

VIII. Cloning of portions of the SCA1 Gene into the pMAL™-2 Vector 

DNA from the SCA1 gene was cloned into the pMAL™-c2 vector 
20 (New England Biolabs, Beverly, MA), which produces a chimeric protein consisting 
the maltose-binding protein fused to the N-terminus of the protein of interest 
(ataxin-1) in a linkage that can subsequently be conveniently cleaved. To obtain 
DNA for cloning, SCA1 DNA was amplified and isolated clone 31-5 (Figure 13) 
using standard PCR techniques. The manufacturer's instructions were followed in 
25 designing the appropriate oligonucleotide primers (pMAL™ vector Package Insert, 
1992 New England Biolabs, revised 4/7/92). In each case an EcoKL linker site was 
designed into the 5' primer and a Hindlll linker site was designed into the 3* primer 
to facilitate cloning. Three different amplification products were obtained. In one, 
DNA was isolated utilizing two 20-mer PCR primers COD and RCOD (Table 10) 
30 that hybridized to the 5' and 3' ends of the coding regions, such that the stretch of 
DNA being amplified contained residues presumed to encode the entire sequence of 
ataxin-1, beginning with Metl and ending with Lys 817 (Figure 15). The amplified 
product was than cloned into the EcoRUHindlll site in the polylinker region of in 
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pMAL™-c2 following instructions provided by the manufacturer. Two other 
constructs were made in the same way using PCR to isolate shorter segments of 
DNA. In both cases the same 3' end primer was used, but different 5' primers were 
employed (Table 10). One 5' primer (3COD) was designed such that the amplified 
5 product began at Met277 (the fourth methionine in the coding region), the other 5' 
primer (8COD) such that the amplified product began at Met548. pMAL™-c2 was 
transformed into competent cells containing a lacZAMIS 

allele for a-complementation and cultured as recommended by the manufacturer. 

Primers for Cloning Into pMai Vector 
Nucleotide Sequence 

TGT GAA TTC ATG AAA TCC AAC CAA GAG CG 
TGT GAA TTC ATG ATC CCA CAC ACG CTC AC 
TGT GAA TTC ATG GTG CAG GCC CAG ATC 
TTC GAA GCT TCT ACT TGC CTA CAT TAG AC 

15 IX* Kxpression of Ataxin-1 . Design of Antigenic Peptides and 
Production of Antibodies 

The fusion protein expressed by the constructs in Example VII were 

purified as directed by the manufacturer using affinity chromatography (pMAL™ 

vector Package Insert, 1992 New England Biolabs, revised 4/7/92). The purified 

20 protein was electrophoresed using 8% SDS polyacrylamide electrophoresis and 
electroeluted. The best expression (about 27 mg from 1 L of cells) was obtained 
from the shortest construct, but all constructs produced measurable levels of protein 
of a size consistent with their respective cloned gene product. 

Antibody response in rabbits was initiated using the multiple 

25 antigenic peptide strategy of V. Mehra et al» Proc. Natl. Acad. Sci. USA . 83. 7013- . 
7017 (1986). In addition to the three electroeluted cloned gene products described 
in the preceding paragraph, three synthetic peptides were used as well. The 
synthetic peptides used were Peptide A (amino acids 4 through 18), Peptide B 



Primer Name 
COD 
3COD 
8COD 
RCOD 
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(amino acids 162 through 176) and Peptide C (amino acids 774 through 788). 
These peptides were chosen such that they showed little or no homology with other 
known short amino acid stretches in proteins and also such that they contained 
proline, which makes it more likely that these fragments are located on the surface 
5 of the protein, thus making it more likely that antibodies to the fragments will react 
with the whole protein as well. 

Immunoglobulin (IgG) from rabbit blood was . purified, and 
antibody/antigen results were analyzed using Western blots as described in Gershoni 
et al., Anal. Bioch., 131. 1-15 (1983). IgG from rabbits injected with the cloned 

10 gene products and the synthetic sequences were found to hybridize to their 
respective antigens. The anti-sera from rabbits immunized with the 8COD-RGOD 
gene product (i.e., the ataxin-1 fragment spanning residues 548 through 817) 
hybridized with a protein of the expected size in brain tissue extracts from mouse, 
rats, and humans. A similar size protein has also been detected using lymphoblasts; 

15 This hybridization is blocked by preincubation with the polypeptide antigen, and not 
blocked by unrelated antigens. In particular, antibodies raised against Peptide C are 
blocked by either Peptide C or the short gene product. 

Xt Molecular and Clinic al Correlations in Spinocerebellar ataxia 
20 typg 1 (SCAD 

A. Materials and Methods 

1. Family Material 

Members representing 87 kindreds with dominantly inherited ataxia 
were evaluated. Nine kindreds of diverse ethnic background (Caucasian American, 

25 African American, South African, Siberian Iakut) were already known to have 
SCA1 based on linkage to the HLA locus and to D6S89 on chromosome 6p. 
Genotypic analysis of the SCA1 CAG repeat was carried out on all nine kindreds to 
determine if all known SCA1 families had the same mutational mechanism 
involving repeat expansion. Most of the study participants were personally 

30 examined. The affected status was always confirmed by a neurologist, but the age 
of onset was based on historical information from the patient and/or other family 
members. Severity of disease was measured by the age at death minus the age of 
onset. Detailed characterization of the repeat variability was carried out for all nine 



WO 95/01437 



PCT/US94/07336 



-79- 

• kindreds. To identify additional kindreds with a CAG expansion at the SCA1 locus, 
affected individuals from 78 newly identified families with dominantly inherited 
ataxia were clinically examined. Blood was collected from at least one affected 
individual from each of these kindreds and screened by DNA analysis for the 
5 presence of a CAG repeat size within the expanded range (> 42 repeats). Although 
there was no evidence that these 78 individuals are related, there is a chance that 
some of the affected patients come from the same families. 

To assess the distribution of CAG repeat sizes on normal 
chromosomes further, the number of CAG repeats was determined for 304 normal 
10 chromosomes from unrelated individuals of various ethnic backgrounds. 

2. Molecular Studies 

Blood samples were used to establish lymphoblastoid cell lines by 
Epstein-Barr virus transformation. Genomic DNA was isolated either directly from 

15 venous blood or from lymphoblastoid cell lines. Blood samples were collected from 
these patients over an 8-year period, during which time 29 patients died. PCR 
reactions were performed using the Repl (TTGACCTTTACACCTGCAT) and 
Rep2 (CAACATGGGCAGTCTGAG) primers. Fifty nanograms of genomic DNA 
was mixed with 5 pmol of each primer in a total volume of 20 jllI containing 1.25 

20 mM MgCl 2 , 250 uM dNTPs, 50 mM KC1, 2% formamide, 10 mM Tris-HCl pH 8.3 
and 1 unit ampliTaq (Perkin-Elmer/Cetus). The Repl primer was labelled at the 5' 
end with [y- 32 P]ATP. Samples were denatured at 94°C for 4 minutes, followed by 
30 cycles of denaturation (94°C, 1 minute), annealing (55°C, 1 minute) and 
extension (72°C, 2 minutes). Six '\xl of each PCR reaction was mixed with 4 |il 

25 formamide loading buffer, denatured at 90°C for 2 minutes, and electrophoresed 
through a 6% polyacrylamide/7.65 M urea DNA sequencing gel. Allele sizes were 
determined by comparing migration relative to an Ml 3 sequencing ladder. 

3. Statistical Analyses 

30 The relationship between age of onset and CAG repeat number on 

both the affected and the normal chromosomes of patients was evaluated through 
linear regression analyses. Similarly, the relationship between repeat length and 
duration of disease was quantified. Ages of onset were used directly in these 
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analyses, but also following logarithmic and square root transformation. Although 
the latter transformation provided the best approximation to a normal distribution, 
results obtained were consistent between analyses before and after transformation. 
Analysis of variance was performed to detect differences among the families in the 
5 mean age of onset, after correction for the effect of the CAG repeat number on age 
of onset. In addition, the sex of the transmitting parent was included as a possible 
explanatory variable for variations in age of onset. AH regression and variance 
analyses were carried out with the SPSS package of computer programs, versions 
4.0.1. 

10 

B. Results 

L Family Studies 

All affected individuals from the nine known SCA1 kindreds had an 
expanded trinucleotide repeat on one of their alleles. No repeat expansions were 

15 observed among eight kindreds previously shown by linkage analyses not to be 
SCA1. These eight kindreds were examined for the SCA1 gene expansion to 
confirm the linkage results. 

Among the 70 other dominant ataxia families analyzed, three (4%) 
were found to have an expanded CAG repeat on one of the SCA1 alleles. Of all of 

20 the dominant kindreds studied, 12 of 87 (14%) have an expanded CAG repeat at the 
SCA1 locus. While the sample size is relatively small, and both estimates are 
arguably biased to exclude or select for SCA1 kindreds, expanded CAG repeat tracts 
within the SCA1 gene clearly account for only a small fraction of this complex 
group of diseases. The distribution of the CAG repeat number from normal controls 

25 and from ataxic individuals that did not have an expansion were similar (data not 
shown). These data argue against the involvement of the CAG repeat at the SCA1 
locus in these families. However, it is still possible that some of these small 
families have other mutations at the SCA1 locus. 

The typical clinical findings in the genetically proven SCA1 kindreds 

30 were gait and limb ataxia, dysarthria, pyramidal tract signs (spasticity, hyperreflexia, 
extensor plantar responses) and variable degrees of occulomotor findings which 
include one or more of the following: nystagmus, slow saccades, and 
opthalmoparesis. In the later stages of the disease course, bulbar findings consistent 
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with dysfunction of cranial nerves IX, X, and XII became evident. Also, dystonic 
posturing and involuntary movements including choreoathetosis became apparent in 
the later stages of the disease. Motor weakness, amyotrophy, and mild sensory 
deficits manifested as propioceptive loss were also detected. Although ataxia, 
5 dysarthria and cranial nerve dysfunction were consistently present in every SCA1 
affected individual, considerable intrafamilial variability was noted with regard to 
all of the other clinical features. Juvenile onset (< 18 years) was observed in four 
kindreds. Of interest is the finding that juvenile onset cases typically inherited the 
disease gene from an affected father. Several of the kindreds that did not have an 
10 expanded SCA1 CAG repeat, displayed the same clinical findings as those observed 
in SCA1 kindreds confirming the inherent difficulty in clinically classifying this 
group of disorders. While it is possible that some of these kindreds have other 
mutations at the SCA1 locus, the disease locus (loci) for eight of these families has 
also been excluded from the SCA1 region by linkage analyses. 

15 

2. Repeat Analysis on Normal and SCA1 Chromosomes 

Figure 20 shows the size distribution of the CAG repeats on 304 
chromosomes from unaffected control individuals who are at risk for ataxia, and 1 1 3 
expanded alleles from individuals affected with the disease. The normal alleles 

20 range in size from 19 to 36 CAG repeat units. Over 95% of the normal alleles 
contain from 25 to 33 CAG repeat units, the majority (65%) of which contain 28 to 
30 repeats. The mean repeat size on normal chromosomes for the African 
Americans, Caucasian, and South African populations are very similar with 29.1, 
29.8, and 29.4 CAG repeat units, respectively. Combined heterozygostiy for the 

25 CAG repeat at the SCA1 locus was 0.809 for the populations examined, giving an 
overall polymorphism information content (P.I.C.) value of 0.787. No change in 
CAG repeat length was observed for 135 meioses of SCA1 alleles containing CAG 
repeat tracts within the normal range, i.e., all were inherited in a Mendelian fashion. 
In contrast, 41 of the 62 meioses involving expanded SCA1 alleles changed in 

30 repeat size. The rate of repeat instability for female meioses is 60% while the 
instability observed for males was 82%. 
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The number of CAG repeats found on SCA1 chromosomes from 113 
affected individuals was always greater than the number of repeats on normal 
chromosomes, ranging from 42 to 81 with a means of 52.6 (Figure 20). 

5 All patents, patent documents, and publications cited herein are 

incorporated by reference. The foregoing detailed description and examples have 
been given for clarity of understanding only. No unnecessary limitations are to be 
understood therefrom. The invention is not limited to the exact details shown and 
described, for variations obvious to one skilled in the art will be included within the 

1 0 invention defined by the claims. 



WO 95/01437 



PCT/US94/07336 



-83- 

WHAT IS CL AIMED IS: 

1. A nucleic acid molecule containing a CAG repeat region of an isolated 
autosomal dominant spinocerebellar ataxia type 1 (SCA1) gene, said gene 
located within the short arm of chromosome 6. 

2. The nucleic acid molecule of claim 1 corresponding to the entire SCA1 gene. 

3. The nucleic acid molecule of claim 1 wherein the SCA1 gene encodes 
ataxin-1. 

4. The nucleic acid molecule of claim 3 of about 2.4-1 1 kb in length containing 
the coding region of the SCA1 gene. 

5. The nucleic acid molecule of claim 1 wherein the CAG repeat region is 
represented by (CAG) n and n = 2-36. 

6. The nucleic acid molecule of claim 5 wherein n = 19-36. 

7. The nucleic acid molecule of claim 1 wherein the CAG repeat region is 
represented by (CAG) n and n > 36. 

8. The nucleic acid molecule of claim 7 wherein n > 43. 

9. The nucleic acid molecule of claim 1 wherein the molecule is a single- 
stranded polynucleotide. 

10. The nucleic acid molecule of claim 9 wherein the single stranded 
polynucleotide is cDNA. 

11. The nucleic acid molecule of claim 9 wherein the single stranded 
polynucleotide is mRNA. 
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12. The nucleic acid molecule of claim 1 wherein the nucleic acid is genomic 
DNA. 

13. An isolated oligonucleotide that hybridizes to a nucleic acid molecule 
containing a CAG repeat region of an isolated SCA1 gene; said 
oligonucleotide having at least about 1 1 nucleotides. 

14. The isolated oligonucleotide of claim 13 having at least about 16 nucleotides. 

15. The isolated oligonucleotide of claim 14 having no more than about 35 
nucleotides. 

16. The isolated oligonucleotide of claim 13 that produces a primed product of 
about 70-350 base pairs. 

17. The isolated oligonucleotide of claim 16 that produces a primed product of 
about 100-300 base pairs. 

18. The isolated oligonucleotide of claim 13 that hybridizes to the nucleic acid 
molecule within about 150 nucleotides on either side of the CAG repeat 
region. 

19. The isolated oligonucleotide of claim 18 that hybridizes to the nucleic acid 
molecule directly adjacent to the (CAG) n region. 

20. The isolated oligonucleotide of claim 13 having at least about 100 nucleotides. 

21. The isolated oligonucleotide of claim 20 having at least about 200 nucleotides. 

22. The isolated oligonucleotide of claim 13 comprising a nucleotide sequence 
selected from the group consisting of CCGGAGCCCTGCTGAGGT (CAG-a), 
CCAGACGCCGGGACAC (CAG-b), AACTGGAAATGTGGACGTAC 
(Rep- 1 ), CAACATGGGC AGTCTGAG (Rep-2), 
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CCACCACTCCATCCCAGC (GCT-435), TGCTGGGCTGGTGGGGGG 
(GCT-214), CTCTCGGCTTTCTTGGTG (Pre-1), and 

GTACGTCCACATTTCCAGTT (Pre-2). 

23. A method for detecting the presence of a DNA molecule containing a CAG 
repeat region of the SCA1 gene comprising: 

(a) digesting genomic DNA with a restriction endonuclease to obtain DNA 
fragments; 

(b) probing said DNA fragments under hybridizing conditions with a 
detectably labeled gene probe, which hybridizes to a nucleic acid 
molecule containing a CAG repeat region of an isolated SCAI gene 
having at least about 1 1 nucleotides; 

(c) detecting probe DNA which has hybridized to said DNA fragments; and 

(d) analyzing the DNA fragments for a CAG repeat region characteristic of 
the normal or affected forms of the SCAI gene. 

24. The method of claim 23 wherein the step of analyzing comprises analyzing for 
a (CAG)n region wherein n > 36. 

25. The method of claim 24 wherein the step of analyzing comprises analyzing for 
a (CAG)„ region wherein n > 43. 

26. The method of claim 23 wherein the detectably labelled DNA sequence 
comprises a portion of an EcoRl fragment of the SCAI gene. 

27. The method claim 26 wherein the EcoRl fragment comprises about 3360 base 
pairs. 
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2o. A method for detecting the presence of a DNA molecule located within an 
affected allele of the SCA1 gene comprising: 

(a) treating separate complementary strands of a DNA molecule containing 
a CAG repeat region of the SCA1 gene with a molar excess of two 
oligonucleotide primers; 

(b) extending the primers to form complementary primer extension 
products which act as templates for synthesizing the desired molecule 
containing the CAG repeat region; 

(c) detecting the molecule so amplified; and 

(d) analyzing the amplified molecule for a CAG repeat region characteristic 
of the SCA1 disorder. 

29. The method of claim 28 wherein the step of analyzing comprises analyzing for 
a (CAG) n region wherein n > 36. 

30. The method of claim 29 wherein the step of analyzing comprises analyzing for 
a (CAG) n region wherein n £ 43. 

31. A protein encoded by the SCA1 gene having therein a glutamine repeat 
region. 

32. The protein of claim 3 1 having a molecular weight of about 20-90 kD. 

33 . The protein of claim 3 1 having the amino acid sequence shown in Figure 1 5. 

34. An antibody to a protein encoded by DNA containing a CAG repeat region of 
the SCA1 gene. 

35. A method for detecting the SCA1 disorder comprising: 

(a) contacting an antibody to a protein encoded by the SCA1 gene with a 
biological sample containing antigenic protein to form an antibody- 
antigen complex; 

(b) isolating the antibody-antigen complex; and 
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(c) sequencing the antigen portion of the antibody-antigen complex using 
amino acid sequencing techniques. 
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61 
121 
OT1 
241 
301 
3S1 
421 
481 
541 
. 601 
661 
721 
781 
841 
901 
961 
1021 
1081 
1141 
1201 
1261 
1321 
1381 
1441 
1501 
1561 
1621 
1681 
1741 
1801 
1861 
1921 
1981 
2041 
2101 
2161 
2221 
2281 
2341 
2401 
2461 
2521 
2581 
2641 
2701 
2761 
2821 
2881 
2941 
3001 
3061 
3121 
3181 



TTTTGAAACT 
TSTGTGTTTG 
GCTAAGGGGA 
AGGACTGTTC 
GGGTCCACCT 
GAGGTGAGAA 
TTGAAGAAGA 
TTGXTAACTC 
TCTTCCTTTA 
GTTTCGTTTT 
CTCGGCTCAC 
AGTAGCTGGG 
TTTTTTATTT 
GACCTCGTGA 
G7GCT7GGGG 
TTGCCAAAGA 
CTCCTCCCTT 
C7G77GGCGG 
GAAATCAAAT- 
CAGTGACCGT 
GCCTGCCTCC 
CTACCCTGAC 
TGGTGGCCGG 
TTTACAACAG 
CAGCGCTCCC 
GCCAGGGACC 
TGSGTCCTCC 
AACCGCCAAC 
CGCTCCCAGC 
CCGGGACACA 
CATCAGCAGC 
GCTCCGGGGC 
ATTTCCAGTT 
CTCCACCCCC 
GTCATGCAAT 
GAGAGCAGCC 
AAGAGCCGGC 
AAGTCGGTTC 
TACAGCAGTC 
ACGCCCGCAG 
CXCAACGACA 
CCCCACACGG 
CCACGGCCTT 
AAGCAATCAC 
TGCTCATCCC 
CGTCATCCCC 
GCGAGAACTT 
CCCAGATCCA 
TGCCTCCCTA 
TGGAAGACTT 
CGACTCCAGC 
GTTCfGCCGTC 
CACCATACCG 
TTAAGGGTTC 



3241 
3301 
3361 



TGCAGAGAAC AGGATTATTT CTGGCGGCCT CTGCTGAGTT 
TGTGTGTGTG TATTAGGGAG AGGAAATCGT AGGTCCAGTG 
ATCTTGGAGA GTAGTGGCTC TGGCAGATGA GGATTCAGAA 
TGGACTTTCA CTBCTAACCT GCTTTTTCTC AGTGCCSGGC 
GGTGTCATGC TCTCCAAGGG CTJC A TTTTA TGTTCCAGCC 
ATGGAACCAA CATTTCT6AA AAGGAAATTT AAGAACTGCA 
AAAGGAGAAA AAAAAACAGG AGAGAGGGTA TO6AGAACAT 
CAXTAAAAAA TATATCTGTT ACAGTGTTCA CTTGCCCAGT 
TAAXGTGCAG CTGCCACGGC TAGTGTI I IT G IXTTT G T TG 
TGGAGACAGA GTGTCGC7CT GTTGCCCAGG CTGGAGTACA 
TGCAACCTCT GCCTCCTGGG TTCAAGCAAT TCTCCTGCCT 
ACTACAGCCG SGTGCCAGCT AATCTTACAC CAGGCTAAAT 
TTGGTAGAGA CGGGGTTTCA CCATGTTAGC CAGGATGGTC 

iCT6SCTGe& TSGGcereee aaagtgttsg gt&gt gs- t -m 

TATGATTGGG TTATGGGAGT TCACACCGAG TCCAGGGCCT 
TGTTCTTTCC CCGGTGCTCA.TGTTCTGATG tcctttccct 
TCCTTTTCCC TTTGTCACTG CCCTCTTCCC TTTCCCAGCA 
ATTGTACCCA CGGGGAGATG ATTCCTCATG -AAGAGCCTGG 
G2GACTTTCC GTTOATCAGA CTAAAATCAG AGCCATCCAG 
GGAGGGGGGA CGGCGAAAAA TGAAATCCAA CCAAGAGCGG 
CAAGAAGCGC 6AGATCCCCG CCACCAGCCG 6TCCTCGGAG 
CCAGCGACAA CCACCGGGTG GAGGGGACAG CAXTGGCTCC 
GGCCACGGGG GCGGGAGGCA TGGGCCGGCA GGGACCTCGG 
GGAA3AGGTT 2ACACAAAGC ATTGTCCACA GGGCTGGACT 
AGGTCTGTCC CCGTGGCCAC CACGCTGCCT GCCGCGTACG 
CCGGTGTCCC CCGTGCAGTA CGCTCACCTG CCGCACACC? 
CAAIACAGTG GAACCTATGC CAGCT7CATC CCATCACAGC 
CCCGTCACCA GTGCAGTGGC CTCGGCGCAG GGGCCACCAC 
TGGAGGCCTA TTCCACTCTG CTGGCCAACA TGGGCAGTCT 
AGGCTGAGCA GCAGCAGCAG CAGCACCAGC AGCAGCAGCA 
AGCAGCAGCA GCAGCAGCAG CAGCAGCAGC AGCAGCAGCA 
TCATCACCCC GGGTCCCCCC CAACCAGCCC AGCAGAACCA 
CTCCGCAGAA CACCGGCCGC ACCGCCTCTC CXCCGGCCAT 
ACCAGACGAT GATCCCACAC ACGCTCACCC TGGGGCCCCC 
ACGCCGACTC CGGCAGCCAC TTTGTCCCTC GGGAGGCCAC 
GGCTGCAGCA GGCCATCCAG 6CCAAGGAGG ICCTGAACGG 
GGTACGGGGC CCCGTCCTCA GCCGACCTGG GCCTGG6CAA 
CTCACCCGTA CGAGTCCAGG 'CACG7GGTGG TCCACCCGAG 
CTGATCCTTC GGGGGTCCGG GCCTCTGTGA TGGTCCTGCC 
CTGACCTG6A GGTGCAACAG GCCACTCATC GTGAAGCCTC 
AAAGTGGCC? GCATTTAGGG AAGCCTGGCC ACCGGTCCTA 
TCATXCAGAC CACAC^CAGT GCTTCAGAGC CACTCCCGGT 
CSACGCAGGG ACTCAACCCC CTGTCATCGG CTACCTGAGC 
CTAC6CCGGC AGCCTGCCCC AGCACCTGGT GATCCCCGGC 
GGT OgG CAGC ACTGACATGG* AAGCGTCGGG GGCAGCCCCG 
CCAGTTTGCT GCAGTCCCTC ACACGTTCGT CACCACCGCC 
CAACCCTGAG GCCCTGGTCA CCCAGGCCGC CTACCCAGCC 
CCTGCCTGTG GTGCACTCCG TGGCCTCCCC GGCGGCGGCT 
CTTCATGAAA GGCTCCATCA TCCAGTTGGC CAACGGGGAG 
AAAACAGAAG ATTTCATCCA GAGTGCAGAG ATAAGCAACG 
ACCGTAGAGA GGATTGAAGA CAGCCATAGC CCGGGCGTGG 
G6GGAGCACC 6AGCCCAGGT AACGTTAGCC AGGGTGGCAC 
TGATGCCATC ATCATCTCCT GGCAAGACGA ATTGCTTCTA 
TCGGGTACAC CTAGACCTTA GACTCGGCCT TTCCCAACTG 



GGCGTG7G7G 
TGGACCCAGA 
ATCGAG7GCA 
TCTGAGGGCA 
AGGCAAAGGA 
TCATCTGCCC 
CTTAGGGGAG 
GTCTT CATA A 
TTGTTGTTTT 
A7GG7GCAA7 
CAGC CTCTC A 
TTGTTTTTTA 
TTAATCTCCT 
CTCTGCTTCX 
AGTCT7AATC 
CCTTCCCTM 
TCCA6AGCTG 
ATCCCCTACA 
AACAGTGAAA 
AGCAACGAAT 
GAGAAGGCCC 
CGGGCAACCC 
7GGAGC77GG 
ACTCCCCGCC 
CCACCCCGCA 
TCCAGTOCAT 
TGATCCCCCC 
TCCATCCCAG 
6AGCCAGACG 
GGAGCATGAG 
CC7CAGCAGG 
GTACGTCCAC 
CCCCG7CCAC 
C7CCCAGG7C 
CAAGAAAGCC 
TGAGAXGGAG 
GGCAGGCGGC 
CCCCTCAGAC 
CAACAGCAAC 
CCCTTCTACC 
CGCGCTCTCA 
GGACTGCCAG 
GGCCAGCAGC 
ACACAGCCCC 
GCCATAG7CA 
CT7CCCAAGA 
ATGGTGCAGG 
CCCCC7ACGC 
C7AAAGAAGG 
XCCTGAAGA? 
CCGTGATACA 
AGGGA7GGGA 
TGAGGCAGGA 
CCTTCTCTAG 



AAAAAATAAG CCCCATTTCC CCGTGATCTC 7GCTGTGTGT AATGAATTAA CCTCCATGCA 
TGGAGAGTGG GGCTAG7TAT GGAGTCCTTG AGACAATCCA GAAACTCACC ACTCTCGMA 
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Patient #1 (CAG) nCACCTCAGCAGGGCTCCGGGGCTCATC; n-56. 
CAGCAGCXGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG^ 

CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGC^GCAGCAGCAGCAGCAGCAGCAGC^ 
CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCACCTCAGCAGGGCTCCGGGGCTCAXC 



Patient *2 ( CAG ) nCACCTCAGCAGGGCTCCGGGGCTCATC • *-69. 

CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCM 
CAGCAGCAGCAGCAGCAiSCAGCAGCAGCAGCAGCAGCAGCAGCA^ 
CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCWSC^GCA'GCAGCAGC^ 
CAGCAGCAGCACCTCAGCAGGGCTCCGGGGCTCXTC 



Patient #3 ' (CAG) nCACCTCAGCAGGGCTCCGGGGCTCATC; n-47 . 

CAGCAGCAGCAGC^GCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGC^ 

CAGCAGC^GCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGC^ 

CAGCAGCAGCACCTCAGCAGGGCTCCGGGGCTCATC 



Patient #4 (CAG) nCACCTCAGCAGGGCTCCGGGGCTCATC; n-48. 

CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGC^ 

CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGC^ 

CAGCAGOIGCAGCACCTCAGCAGGGCTCCGGGGCTCATC 



Patient #5 TGAG(CAG)n; n»50. 
•TGAGCAGCAGCA*^AGCAGCAGCAGCAGCAGC»GCA^ 

GCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGC^GCAGCAGCAGCAGCAGCAGCAGCAGCA 
GCAGCAGCAGCAGCAGCAGCAG 
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1 GATCCCCCCki: ACCGCCAACC CCGTCACCAG TuCi&GTGGCC TCGGCGCJtGG 

51 GGCCXCCACT CCATCCCAGC CCTCCGAGC7 GGAGGCCTAT TCCACTCTGC 

_ CAG-b y 



101 TGGCCAACAI GGGCAGTCTG AGCCAGACGC CGGGACACXX GGCTC&CCAC 

151 CJLGCAGCAGC AGCAGCJtGCft. GCfcGCAGCAG CAGCATCJU5C ATC&GCAGC& 

X CAG»a 

201 GCJLSC&GCAjS CUSCXGCXGC AGC&GCAGCJL GCACCAGCAC CTCAGCX6GG 

^ 6cr-z\4 ^ 

251 CTCCGGGGCT CATCACCCCG GGTCCCCCCC ACCAGCCCAG .CJtGAACCACT 

• Rap-1 Pre»2. »*• y 
301 ACGTCCJUC3UT TTCCAGTTCT CCGCAGAAC&. CCGGCCGCAC CGCCTCTCCT 

351 CCGGCCATCC CCGTCCXCCT CC&CCCCCAC C&GACGATGA TCCCACACAC 

401. GCTC&CCCTG GGGCCCCCCT CCCAGGTCGT CXTGCAAffAC GCCGACTCCG 

Y*+ t 

451 GC3USCC&CZT TGTCCCTCGG G&GGCC&CCX AGJUAGCCGA GAGCAGCCGG 
501 CTGCAS. 
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FZGURS 15 



1 
91 
1B1 
271 
361 
451 
541 
631 
721 
811 
901 
.1 
991 
20 
1081 
50 
1171 
80 
1261 
110 
1351 
140 
1441 
170 
1531 
200 
1621 
230 
1711 
260 
1801 
290 
1891 
320 
1981 
350 
2071 
380 
2161 
410 
2251 
440 
2341 
470 
2431 
500 
2521 
530 
2611 
560 
2701 
590 
2791 
620 
2881 
650 
2971 
680 
3061 
710 
3151 
740 
3241 
770 
3331 
800 
3421 
3511 



CTACTACAGTGGCGGACGTACAGGACCTGTTTCACTGCAG 
CAAGACETTGTTTCrCTCCCTCTC 

caascaccaasCTcccTGATTCA^ 

ACAGCAATCTCCTCCTCCACTGCCACTACA^ 

AAACTTAGTGCTGL'raCAGACCA 1 m ' lWf C 

TAGGUUTI'lTACACTGAGATTCrCCft 

ATGGTTCTCCATTGTGATCAAAGCACATGGTACAGT 

TCATAGGGTATTTCTCACTTCTCTGTGAAAGGAAGAAAG^ 

TCTCCATGGTGAAGTATAGGCTGAGGCTACCTGTGAACAGTA 

GGAGATGATTCCTCATGAAGAGCCTGGATCCCCIACAGAA 

GTGAAACAGTCACCGTGGAGGGGGGACGGCC^AAAATGAAATCCA^ 

MKSNQERSNBCLPPKKREI 
TCCCCGCOICCMCCGGTCCTCCGAGGAGAAGGCCCCTO 

PATSRSS EEKAPTLPSDNHRVBGTAWLPGN 
ACCCTGGTGGCCGGGGCCACaSGGGCGGOAGCC^^ 

PGGRGHGGGRHGPAGTSVELGLQQGIGLHK 
AAGCATTGTCCACAGGGCTGGACTACrC^ 

ALSTGLDYSPPSAPRSVPVATTLPA A Y X ~T P 
CGCAGCCAGGGACCCCGGTGTCCCCCGTGCAGTACG CTCACCTC 

QPGTPVSPVQYAHIiPHTFQFIGSSQYSGTY 
ATGCCAGCTTCATCCCATCACAGCTGATCCCCCCAACCGCCAACCCCGTCA 

ASPI P S Q L 1 P P TAN PVT S AVAS A " A G A T T P S 
CCCAGCGCTCCCAGCTGGAGGCCTATTCCACTCTGCT^ 

QRSQX.EAYSTLLANMGSZ.SQTPGHXABQQQ 
AGCAGCAGCAGCAGCAGCAGCAGCSIGCAGCATCAG^ 

QQQQQQQQQH QHQQQQQQQQQQQQ Q QQHLS 

RAPGLITPG S P PPAQQHQYVHISSSP Q A ^ A ^ CC G 
GCCGCACOSCCTCTCCTCCGGCCATCCCCGTCCAGCrc 

RTASPPAI PVHLHPHQTMIPHTLTLGPPSQ 
AGGTCGTCATGCAATACGCCGACTCCX3GCAGCC^CTTT 

VVMQYADSOSHFVPRBATKKAB SSR LQQAI 
TCCAGGCCAAGGAGGTCCTGAACGGTGAOATGGAGAAGAGCCX^ 

QAKBV-LNGBMBKSRRYGAPS SADX.GX.GKAG 
GCGGCAAGTCGGriXX'lXyiCCCGTACGAGrroZATO 

GKSVPHPYESRHVVVHPSPSDYSSRDPSGV 
TCCGGGCCTCTGTGATGGTCCTGCCCAACAGCAACACGC^ 

RAS VMVL. PNSNTPAADLBVQQATHREASPS 
CTACCCTCAACGACAAAAGTGGCCTOCATTTAGGGA 

TLNDKSGLHLGKPGHRSYALSPHTVIQT TH 
ACAGTG CTTCAGAGCCACTCCCGGTGGGACTGCCAGCCACGG CC^ 

SASEPLPVGL PATAPY AGTQPPVIGYIiSGQ 



QQAITYAOSLPQHLVIPGTQPLLIPVGSTD 
ACATGGAAGCGTCGGGGGCAGCCCCGGCCATAGTCACGTCATCCr^ 

MBAS GAAPAIVTS S PQFAAV PHTPVTTALP 
CCAAGAGCGAGAACTTCAACCCIGAOGCCCTGGTCAC^ 

KSBNFNP EALVTQAAYPAMVQAQIHLPVVQ 
AGTCOTTGGCCTCCCCGGCGGCGGCTCCCCCTACGCTGCCT 

SVASPAAAPPTLPPYFMKGSIIQLANGELK 
AGAAGGTGG AAGACTTAAAAACAGAAGATTTCATCCA 

KVBDI#KTEDFI QSAEZ SNDLKIDSSTVERI 
TTGAAGACAGCCATAGCCCGGGCGTGGCCXrrGATACAGCT 

E P S H S PGVAVI QPAVGEHRAQVSVBVLVEY 
ATCCT 1 1TT1TGTGTTTGGACA6CGCTGGTCATCCTGCTGTCOGGAGAGA 

P FFVFGQGWSSCCPERTSQLFDLPCSKLSV 
TTGGGGATGTCTGCATCTCGCTTACCCTCAAGAACCTGAAGAACGGCT 

GDVCISLTLKNLKNGSVKKGQPVDPASVLL 
TGAAGCACTCAAAGGCCGACGGCCTGGCGGGCAGCAGACACAGGTATGCCGAGCAGGAAAA^ C 

KHSKADGLAGSRHRYAEQENGINQGSAQML 
TCTCTGAGAATGGCGAACTGAAGTTTCCAGAGAAAATGGGATTGCCTGCA^ 

SENGELKFPEKMGLPAAPFLTKIEPSKPAA 
CAACGAGGAAGAGGAGGTGGTCGGCGCCAGAGAGCCGCAAACTGGAGAAGT 

TRKRRWSAPESRKI#EKSEDEPP1»TLPKPS1* 
TAATTCCTCAGGAGGTTAAGATTTGCATTGAAGGCCGG 

IPQEVKI CIEGRSNVGK* 
TTATCATTTGTATCCAG ATTACTGTACTGTAGG CTAAAATAACACAGTATTTACATGTTATL 1 lLTrA ATTTTAG G TI- rCrU ' riX. T A ACC 
TTGTCATT AGAGTT ACAGCAGGTGTGTCG CAGGAG ACTGGTG CATATG C 1 i, lU ' T CCACGAGTGTCTGTCAGTGAGCGGGCGGGAGGAAGG 
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Figtire 15 (continued) 
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GCACAGCAGGAGaCTCAGGGCTC^^ 
MCCGGGGGCGCTGACTCCCGCTAGtTGTCAGGft^ 



CACAGGCGC 



3 CCTCACTCTCTTG CTATCGGCATGGGCCGGGGGGGTTCA 



GGAAAGTTAACCATTTAARAGAAC ArXTX XVXl^ ^ 
jqcaQGCCTQACTOCAGTCT^^ 

CGAACACCCACGl-R^'^ 

CTTOtfgrGftGftGCRft(»Ga^ * 1 l UUtCIC IMMttTlM'TT ATAGTTCniG ACXAT G 

GACAACTCGGGTGCCACTTTTTTTTTTTTCAGATTC 

GCATTTAAAAATACTCrKaCACm 

AAAACAAAAACAAAAAAAACTAA b X ' 1 * UL 1 X ICTITIT X ICAACACTGTAACTACATTTCAGCTCTG 

CACTTTTCACTTTGATGTCTaAGAATCAGTTCAAGGCM 
C^TTT TTCrrCCAGTGTr riTCl 111 1 A AGMXSMCTTTTAAJtfatfkC^ 
TtX3^yiTlTTATCXM3^^ 

AACTCTAGGCCATTTTATAAGGTTA'li;Tl LCI 1 1G AAAATTCATTTTGGTCTTTTTACCRCRTCTG TCACA AAAA 
GCTCTTAGAAACTCTGftGA AlTrA ' C X ' XC AGA1TI XA 1T G AGAGAGTTTTCCATAAAGACATTTAT 

AATTACTTT A 'r X ' A 'l'lXjj XVI X A 1 XAA1XJ1 * X ' ATT lU ' CA SAATGGCXVlTX'l'X X JCIATTC AAAATCAAAT0GRG ATTTA A^v»1"a-a<3 GTACA 
AACCCAGAAAGGGTATTTCATAGTTTTTAAA CC ^^ * * XtaAATGCAj. un iAAAGTG 

CITTAAAAAAAAGTTTTATAAGTAGGGAGAAAT^^ 

ATR U lTCXrX'XXjACTTTCCTOGAATTTCRTTA 

TK umii w ivr A TrxTrx !AAA t^^ 

UlUUTTCTTl 1 XUX X X X 1 lA GCCTTTGATGOTAAGftG^ 

CATGTGGACTCAGAAAAAGACACACCACCTTTTGGC X XACX X LU71GTATTGAATTGACTGGATCCACTAAACCAAC7VCTAAGATGGGAAA 

ACACRCATGGTTTGGftGCAATaOGAACKlXaT^ 

jUVC^YI V X ^ CATTTC ar X'L^^^ 

GCl^TTCCCTTTTGGCTTTTTCCTT^^ 

GCCTCCCACCTTTCCCCTGCTGCGGATCCTGAGT^ 

GCCj^C^AGOAGACCOSGGGGAGGAACCGCAGTGTC^ 

r X CA i n UtJf A A0A£XK3lCICTGGAgCC^^ 

GTGGC^^ XC AAATAE5GRAGAACGC3^CAGAGGGCAGG 

AGCCTC L ^ XtiXUUtj CAGACCmGQaKSGCCCCGAG^ 

GOTAGJtfAXTCTTCGG'Xt a C CT 

TTTTCTTCTTTCTTTCCTGTTITCCATTTITAAJ^ 
CAGGGOGAAAAGAAAAAAAXAATACrATTAATAAQAAACC^^ 
CTTCCTTAGAA raXTXrXA ACTTAAGA AlTAT^ 
CACTTACCTCAGATCTTTTftAA(n^^ 

TTCCCTAACTCACCCAGTTTAGTTTGGGATGATT^ XLXIjX 1UX XGI I GATCCCATTTCTAACTTGGAATTGTGAG CC 1' CTATUTTT 

TCTGTTAGGTGA U l' G Xl.XX G G UrXTXU ' XCCC CC CA CCftGGA^^ 

ACAL V l V XVI'CT C ftGGGACGGGGC^ 

CAATTTCAAGGAAl^lTXX^l'X'Xt:CTGCATCTTG^ 

GAACAGTAGCTCCTAGTAATCATAAAAXCCALTl^^^^ XlAi XI X J ol Imi^ i 

AGCTCTGGA l ' l - X ' XXirXTXUX X X XUX ' X ' X X X lA AQGJlAACGATTGACRAm CCLTXn ' A ACATClXOTAC^ 
ATAGAGAGAAAAATCTCCRATGCTTTTCAAGACACTAATACC^ 

TTTGCAt^O^ 1 Li ' 1 CGGTGGTGATGTGAAAGGAGAGAA 

TTACAC1 XXXXXXXXXXX XA AGlXSGCGTGGAGGC C^XXjCXU 'CCAC ArX 1 XGX 1 1 X XA ACCCRGRATT TCTGAAATA C5AGA ATTTAA QAAC 
ACATCAAGTAATAAATATACAGAGAATATA C ' X X"X X X 1 A TAAAGCACAXGCATCXX » l"l A TXXjlliTl^ X 'X' CCX'CTC X X X'X'CCACG 

GACA Ci l XJ l"X\ a l \»T l"X'CTGGCATAGGGAAACTCCAAACAACTTC 

CTTCRAATACXrrTACCTTACTGATGATAGGA lXrXUUTCI I G I AGCACTATACCTTGTGGGAArX"X"X'X'*X"X' X f AAATGTACACCTGATTTGA 

GAAGCTGAAGAAAACAAAATTTTGAAGCACTCACnTTGAGGACT 

AATGATTCATTCACTXrrrTGAAAGATAT 

AAAGTTACATGTTTTTTTCTATATAGAAATTTGTCAT^ 

GGCTCTTAAACTATACCTATGLn TAX X ^ ^ 

AT ACTGTTCATTCCTATGCTGAAAGTACTTCTCTGAG CTCC C'1'1 C 1 1 A GTCTAAACTCTTAAGCCAT TGCAAl ULiiiiitw CAGAGA 

TGATGTTTGACATTTTCAGCACTTCCTGTTCCTA 

ACCAATCAAACAGGACTCATTATGGGGACAAAAAAAAAAAAA 

AAAACATGATTTCAATCTAAATGCCTCATTTTATT^ 

TCTTCTCAGAATAGTATTCCTGTCCAAAAATCAAGCCGGACAGT^ 

CTTAAATTOVGAATCTCGTCCCCTCCCTTCTCGTTGAA 

CGGCTTCAGTTTTTCATCTCCCCATGACTTC 

ACAATAACAACAATCTCTAAGAATTTCCATAA CX " X " X TLX X A TCTG AAAGG ACTCAAGTCTTCCACTGCJUjATACATTGGAGG CTTC ACCC 
A CG - XTX ' X ' CX X X CCCTTTA GTX XOX X IGCTG 1 CTGGATGGCCAATGAGCCTGTCTCCTTTTCTGTGGCCAA 
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Figure 15 (continued) 



9631 GTT nTGrr CACAGTAATCCTTACSAAGATA^ 9720 

9811 AATCCTTTAGTTAGTGCATTTGAACTTGGTACCTGTG CATTCAl» 1 1 ^ A vjTGAATACTGCCL"lU"rri l GGCGGQj 11 i CCTCATCTCCCCAG 9900 

9901 CCTGAACTGCICAACreiVAAACCCAAATTAGTGTCM 9990 

9991 AGACAGTCTTCATTTCCAGCCAGTGGAGTCCTGGCT 10080 

10081 CCX3VCC»TATGCCTCCCACAGGCCAAGGGAAAACAGA 10170 

10171 GAACTAGGGAAGGAATG AIXJTlTrG CACCTTATTGAAAAGAAA A ' l II lAAGTGCATACATAATAGTTAAGAGLVl'IUATTGlX^ACAGGAG 10260 

10261 AA Cll ITTlt ^TATGOGTCCATACl'Cl^TOr A ATTCCA ^ All I 1 A AACAAATATTAAAAAATG 10350 

10351 GAAGAATTCATA riiri ' A ' lU TT C T A Al^ 10440 

10441 GATGGTCCTTGCAGGl'lU"lVrAGCTAGA 10530 

10531 TAAATTGTCTGTATACCAGTACAAGTTTATTGTTTCACT 10620 

10621 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 10660 
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